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Preface to the Second 
Edition 


In this new edition, which is a substantially revised version of the old one, 
I have added five new chapters: Vectors in Relativity (Chapter 8), Tensor 
Analysis (Chapter 17), Integral Transforms (Chapter 29), Calculus of Varia¬ 
tions (Chapter 30), and Probability Theory (Chapter 32). The discussion of 
vectors in Part II, especially the introduction of the inner product, offered the 
opportunity to present the special theory of relativity, which unfortunately, 
in most undergraduate physics curricula receives little attention. While the 
main motivation for this chapter was vectors, I grabbed the opportunity to 
develop the Lorentz transformation and Minkowski distance, the bedrocks of 
the special theory of relativity, from first principles. 

The short section, Vectors and Indices, at the end of Chapter 8 of the first 
edition, was too short to demonstrate the importance of what the indices are 
really used for, tensors. So, I expanded that short section into a somewhat 
comprehensive discussion of tensors. Chapter 17, Tensor Analysis, takes 
a fresh look at vector transformations introduced in the earlier discussion of 
vectors, and shows the necessity of classifying them into the covariant and 
contravariant categories. It then introduces tensors based on—and as a gen¬ 
eralization of—the transformation properties of covariant and contravariant 
vectors. In light of these transformation properties, the Kronecker delta, in¬ 
troduced earlier in the book, takes on a new look, and a natural and extremely 
useful generalization of it is introduced leading to the Levi-Civita symbol. A 
discussion of connections and metrics motivates a four-dimensional treatment 
of Maxwell’s equations and a manifest unification of electric and magnetic 
fields. The chapter ends with Riemann curvature tensor and its place in Ein¬ 
stein’s general relativity. 

The Fourier series treatment alone does not do justice to the many appli¬ 
cations in which aperiodic functions are to be represented. Fourier transform 
is a powerful tool to represent functions in such a way that the solution to 
many (partial) differential equations can be obtained elegantly and succinctly. 
Chapter 29, Integral Transforms, shows the power of Fourier transform in 
many illustrations including the calculation of Green’s functions for Laplace, 
heat, and wave differential operators. Laplace transforms, which are useful in 
solving initial-value problems, are also included. 
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The Dirac delta function, about which there is a comprehensive discussion 
in the book, allows a very smooth transition from multivariable calculus to 
the Calculus of Variations, the subject of Chapter 30. This chapter takes 
an intuitive approach to the subject: replace the sum by an integral and the 
Kronecker delta by the Dirac delta function, and you get from multivariable 
calculus to the calculus of variations! Well, the transition may not be as 
simple as this, but the heart of the intuitive approach is. Once the transition 
is made and the master Euler-Lagrange equation is derived, many examples, 
including some with constraint (which use the Lagrange multiplier technique), 
and some from electromagnetism and mechanics are presented. 

Probability Theory is essential for quantum mechanics and thermody¬ 
namics. This is the subject of Chapter 32. Starting with the basic notion of 
the probability space, whose prerequisite is an understanding of elementary 
set theory, which is also included, the notion of random variables and its con¬ 
nection to probability is introduced, average and variance are defined, and 
binomial, Poisson, and normal distributions are discussed in some detail. 

Aside from the above major changes, I have also incorporated some other 
important changes including the rearrangement of some chapters, adding new 
sections and subsections to some existing chapters (for instance, the dynamics 
of fluids in Chapter 15), correcting all the mistakes, both typographic and 
conceptual, to which I have been directed by many readers of the first edition, 
and adding more problems at the end of each chapter. Stylistically, I thought 
splitting the sometimes very long chapters into smaller ones and collecting 
the related chapters into Parts make the reading of the text smoother. I hope 
I was not wrong! 

I would like to thank the many instructors, students, and general readers 
who communicated to me comments, suggestions, and errors they found in the 
book. Among those, I especially thank Dan Holland for the many discussions 
we have had about the book, Rafael Benguria and Geblrard Griibl for pointing 
out some important historical and conceptual mistakes, and Ali Erdem and 
Thomas Ferguson for reading multiple chapters of the book, catching many 
mistakes, and suggesting ways to improve the presentation of the material. 
Jerome Brozek meticulously and diligently read most of the book and found 
numerous errors. Although a lawyer by profession, Mr. Brozek, as a hobby, 
has a keen interest in mathematical physics. I thank him for this interest and 
for putting it to use on my book. Last but not least, I want to thank my 
family, especially my wife Sarah for her unwavering support. 


S.H. 


Normal, IL 
January, 2008 




Preface 


Innocent light-minded men, who think that astronomy can 
be learnt by looking at the stars without knowledge of math¬ 
ematics will, in the next life, be birds. 


Plato, Timaeos 


This book is intended to help bridge the wide gap separating the level of math¬ 
ematical sophistication expected of students of introductory physics from that 
expected of students of advanced courses of undergraduate physics and engi¬ 
neering. While nothing beyond simple calculus is required for introductory 
physics courses taken by physics, engineering, and chemistry majors, the next 
level of courses—both in physics and engineering—already demands a readi¬ 
ness for such intricate and sophisticated concepts as divergence, curl, and 
Stokes’ theorem. It is the aim of this book to make the transition between 
these two levels of exposure as smooth as possible. 


Level and Pedagogy 

I believe that the best pedagogy to teach mathematics to beginning students 
of physics and engineering (even mathematics, although some of my mathe¬ 
matical colleagues may disagree with me) is to introduce and use the concepts 
in a multitude of applied settings. This method is not unlike teaching a lan¬ 
guage to a child: it is by repeated usage—by the parents or the teacher—of 
the same word in different circumstances that a child learns the meaning of 
the word, and by repeated active (and sometimes wrong) usage of words that 
the child learns to use them in a sentence. 

And what better place to use the language of mathematics than in Nature 
itself in the context of physics. I start with the familiar notion of, say, a 
derivative or an integral, but interpret it entirely in terms of physical ideas. 
Thus, a derivative is a means by which one obtains velocity from position 
vectors or acceleration from velocity vectors, and integral is a means by 
which one obtains the gravitational or electric field of a large number of 
charged or massive particles. If concepts (e.g., infinite series) do not succumb 
easily to physical interpretation, then I immediately subjugate the physical 
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situation to the mathematical concepts (e.g., multipole expansion of electric 
potential). 

Because of my belief in this pedagogy, I have kept formalism to a bare 
minimum. After all, a child needs no knowledge of the formalism of his or her 
language (i.e., grammar) to be able to read and write. Similarly, a novice in 
physics or engineering needs to see a lot of examples in which mathematics 
is used to be able to “speak the language.” And I have spared no effort to 
provide these examples throughout the book. Of course, formalism, at some 
stage, becomes important. Just as grammar is taught at a higher stage of a 
child’s education (say, in high school), mathematical formalism is to be taught 
at a higher stage of education of physics and engineering students (possibly 
in advanced undergraduate or graduate classes). 


Features 

The unique features of this book, which set it apart from the existing text¬ 
books, are 

• the inseparable treatments of physical and mathematical concepts, 

• the large number of original illustrative examples, 

• the accessibility of the book to sophomores and juniors in physics and 
engineering programs, and 

• the large number of historical notes on people and ideas. 

All mathematical concepts in the book are either introduced as a natural tool 
for expressing some physical concept or, upon their introduction, immediately 
used in a physical setting. Thus, for example, differential equations are not 
treated as some mathematical equalities seeking solutions, but rather as a 
statement about the laws of Nature (e.g., the second law of motion) whose 
solutions describe the behavior of a physical system. 

Almost all examples and problems in this book come directly from physi¬ 
cal situations in mechanics, electromagnetism, and, to a lesser extent, quan¬ 
tum mechanics and thermodynamics. Although the examples are drawn from 
physics, they are conceptually at such an introductory level that students of 
engineering and chemistry will have no difficulty benefiting from the mathe¬ 
matical discussion involved in them. 

Most mathematical-methods books are written for readers with a higher 
level of sophistication than a sophomore or junior physics or engineering stu¬ 
dent. This book is directly and precisely targeted at sophomores and juniors, 
and seven years of teaching it to such an audience have proved both the need 
for such a book and the adequacy of its level. 

My experience with sophomores and juniors has shown that peppering the 
mathematical topics with a bit of history makes the subject more enticing. It 
also gives a little boost to the motivation of many students, which at times can 
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run very low. The history of ideas removes the myth that all mathematical 
concepts are clear cut, and come into being as a finished and polished prod¬ 
uct. It reveals to the students that ideas, just like artistic masterpieces, are 
molded into perfection in the hands of many generations of mathematicians 
and physicists. 


Use of Computer Algebra 

As soon as one applies the mathematical concepts to real-world situations, 
one encounters the impossibility of finding a solution in “closed form.” One 
is thus forced to use approximations and numerical methods of calculation. 
Computer algebra is especially suited for many of the examples and problems 
in this book. 

Because of the variety of the computer algebra softwares available on the 
market, and the diversity in the preference of one software over another among 
instructors, I have left any discussion of computers out of this book. Instead, 
all computer and numerical chapters, examples, and problems are collected in 
Mathematical Methods Using Mathematica® , a relatively self-contained com¬ 
panion volume that uses Mathematica®. 

By separating the computer-intensive topics from the text, I have made it 
possible for the instructor to use his or her judgment in deciding how much 
and in what format the use of computers should enter his or her pedagogy. 
The usage of Mathematica® in the accompanying companion volume is only a 
reflection of my limited familiarity with the broader field of symbolic manipu¬ 
lations on the computers. Instructors using other symbolic algebra programs 
such as Maple® and Macsyma® may generate their own examples or trans¬ 
late the Mathematica® commands of the companion volume into their favorite 
language. 
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Note to the Reader 


“Why,” said the Dodo, “the best way to ex¬ 
plain it is to do it.” 


—Lewis Carroll 


Probably the best advice I can give you is, if you want to learn mathematics 
and physics, “Just do it!” As a first step, read the material in a chapter 
carefully, tracing the logical steps leading to important results. As a (very 
important) second step, make sure you can reproduce these logical steps, as 
well as all the relevant examples in the chapter, with the book closed. No 
amount of following other people’s logic—whether in a book or in a lecture— 
can help you learn as much as a single logical step that you have taken yourself. 
Finally, do as many problems at the end of each chapter as your devotion and 
dedication to this subject allows! 

Whether you are a physics or an engineering student, almost all the ma¬ 
terial you learn in this book will become handy in the rest of your academic 
training. Eventually, you are going to take courses in mechanics, electro¬ 
magnetic theory, strength of materials, heat and thermodynamics, quantum 
mechanics, etc. A solid background of the mathematical methods at the level 
of presentation of this book will go a long way toward your deeper under¬ 
standing of these subjects. 

As you strive to grasp the (sometimes) difficult concepts, glance at the his¬ 
torical notes to appreciate the efforts of the past mathematicians and physi¬ 
cists as they struggled through a maze of uncharted territories in search of 
the correct “path,” a path that demands courage, perseverance, self-sacrifice, 
and devotion. 

At the end of most chapters, you will find a short list of references that you 
may want to consult for further reading. In addition to these specific refer¬ 
ences, as a general companion, I frequently refer to my more advanced book, 
Mathematical Physics: A Modern Introduction to Its Foundations , Springer- 
Verlag, 1999, which is abbreviated as [Has 99]. There are many other excellent 
books on the market; however, my own ignorance of their content and the par¬ 
allelism in the pedagogy of my two books are the only reasons for singling out 
[Has 99], 
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Part I 

Coordinates and Calculus 



Chapter 1 


Coordinate Systems 
and Vectors 


Coordinates and vectors—in one form or another—are two of the most 
fundamental concepts in any discussion of mathematics as applied to physi¬ 
cal problems. So, it is beneficial to start our study with these two concepts. 
Both vectors and coordinates have generalizations that cover a wide vari¬ 
ety of physical situations including not only ordinary three-dimensional space 
with its ordinary vectors, but also the four-dimensional spacetime of relativity 
with its so-called four vectors , and even the infinite-dimensional spaces used 
in quantum physics with their vectors of infinite components. Our aim in this 
chapter is to review the ordinary space and how it is used to describe physical 
phenomena. To facilitate this discussion, we first give an outline of some of 
the properties of vectors. 


1.1 Vectors in a Plane and in Space 

We start with the most common definition of a vector as a directed line 
segment without regard to where the vector is located. In other words, a vector 
is a directed line segment whose only important attributes are its direction 
and its length. As long as we do not change these two attributes, the vector is 
not affected. Thus, we are allowed to move a vector parallel to itself without 
changing the vector. Examples of vectors 1 are position r, displacement Ar, 
velocity v, momentum p, electric field E, and magnetic field B. The vector 
that has no length is called the zero vector and is denoted by 0. 

Vectors would be useless unless we could perform some kind of operation 
on them. The most basic operation is changing the length of a vector. This 
is accomplished by multiplying the vector by a real positive number. For 
example, 3.2r is a vector in the same direction as r but 3.2 times longer. We 

1 Vectors will be denoted by Roman letters printed in boldface type. 
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Figure 1.1: Illustration of the commutative law of addition of vectors. 


can flip the direction of a vector by multiplying it by —1. That is, (—1) x r = 
—r is a vector having the same length as r but pointing in the opposite 
direction. We can combine these two operations and think of multiplying a 
vector by any real (positive or negative) number. The result is another vector 
lying along the same line as the original vector. Thus, —0.732r is a vector 
that is 0.732 times as long as r and points in the opposite direction. The zero 
vector is obtained every time one multiplies any vector by the number zero. 

Another operation is the addition of two vectors. This operation, with 
which we assume the reader to have some familiarity, is inspired by the obvious 
addition law for displacements. In Figure 1.1(a), a displacement, Ari from 
A to B is added to the displacement Ar 2 from B to C to give AR their 
resultant, or their sum, i.e., the displacement from A to C: Ari + Ar 2 = AR. 
Figure 1.1(b) shows that addition of vectors is commutative: a + b = b + a. 
It is also associative, a + (b + c) = (a + b) + c, i.e., the order in which you 
add vectors is irrelevant. It is clear that a+0 = 0 + a = a for any vector a. 

Example 1.1.1. The parametric equation of a line through two given points 
can be obtained in vector form by noting that any point in space defines a vector 
whose components are the coordinates of the given point. 2 If the components of 
the points P and Q in Figure 1.2 are, respectively, ( Px,Py,Pz ) and {q x ,q y ,qz), then 
we can define vectors p and q with those components. An arbitrary point A' with 
components ( x,y,z) will lie on the line PQ if and only if the vector x = ( x,y,z ) 
has its tip on that line. This will happen if and only if the vector joining P and A', 
namely x — p, is proportional to the vector joining P and Q, namely q — p. Thus, 
for some real number t, we must have 

x — p = t(q — p) or x = t( q — p) + P- 

This is the vector form of the equation of a line. We can write it in component 
form by noting that the equality of vectors implies the equality of corresponding 
components. Thus, 

x = (q x - p x )t + p x , 
y = (q v -Py)t + p y , 

2 = (qz — Pz)t + Pz, 

which is the usual parametric equation for a line. B 

2 We shall discuss components and coordinates in greater detail later in this chapter. For 
now, the knowledge gained in calculus is sufficient for our discussion. 
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Figure 1.2: The parametric equation of a line in space can be obtained easily using 
vectors. 

There are some special vectors that are extremely useful in describing 
physical quantities. These are the unit vectors. If one divides a vector 
by its length, one gets a unit vector in the direction of the original vector. 
Unit vectors are generally denoted by the symbol e with a subscript which 
designates its direction. Thus, if we divided the vector a by its length |a| we 
get the unit vector e a in the direction of a. Turning this definition around, 
we have 


Box 1.1.1. If we know the magnitude |a| of a vector quantity as well as 
its direction e a , we can construct the vector: a = |a|e a . 


This construction will be used often in the sequel. 

The most commonly used unit vectors are those in the direction of coor¬ 
dinate axes. Thus e x , e yi and e, are the unit vectors pointing in the positive 
directions of the x-, y-, and 2 -axes, respectively. 3 We shall introduce unit 
vectors in other coordinate systems when we discuss those coordinate systems 
later in this chapter. 

1.1.1 Dot Product 

The reader is no doubt familiar with the concept of dot product whereby 
two vectors are “multiplied” and the result is a number. The dot product of 
a and b is defined by 

a • b = |aj |b| cos 9, (1.1) 

where |a| is the length of a, |b| is the length of b, and 9 is the angle between 
the two vectors. This definition is motivated by many physical situations. 

3 These unit vectors are usually denoted by i, j, and k. a notation that can be confusing 
when other non-Cartesian coordinates are used. We shall not use this notation, but adhere 
to the more suggestive notation introduced above. 
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Figure 1.3: No work is done by a force orthogonal to displacement. If such a work 
were not zero, it would have to be positive or negative; but no consistent rule exists to 
assign a sign to the work. 

The prime example is work which is defined as the scalar product of force and 
displacement. The presence of cosf? ensures the requirement that the work 
done by a force perpendicular to the displacement is zero. If this requirement 
were not met, we would have the precarious situation of Figure 1.3 in which 
the two vertical forces add up to zero but the total work done by them is 
not zero! This is because it would be impossible to assign a “sign” to the 
work done by forces being displaced perpendicular to themselves, and make 
the rule of such an assignment in such a way that the work of F in the figure 
cancels that of N. (The reader is urged to try to come up with a rule—e.g., 
assigning a positive sign to the work if the velocity points to the right of the 
observer and a negative sign if it points to the observer’s left—and see that it 
will not work, no matter how elaborate it may be!) The only logical definition 
of work is that which includes a cos 9 factor. 

The dot product is clearly commutative, a ■ b = b • a. Moreover, it dis¬ 
tributes over vector addition 

(a + b) • c = a • c + b ■ c. 

To see this, note that Equation (1.1) can be interpreted as the product of the 
length of a with the projection of b along a. Now Figure 1.4 demonstrates 4 
that the projection of a + b along c is the sum of the projections of a and b 
along c (see Problem 1.2 for details). The third property of the inner product 
is that a • a is always a positive number unless a is the zero vector in which 
case a • a = 0. In mathematics, the collection of these three properties— 
commutativity, positivity, and distribution over addition— defines a dot (or 
inner) product on a vector space. 

The definition of the dot product leads directly to a • a = |a| 2 or 

|a| = y/a - a, (1-2) 

which is useful in calculating the length of sums or differences of vectors. 

4 Figure 1.4 appears to prove the distributive property only for vectors lying in the same 
plane. However, the argument will be valid even if the three vectors are not coplanar. 
Instead of dropping perpendicular lines from the tips of a and b, one drops perpendicular 
planes. 
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Figure 1.4: The distributive property of the dot product is clearly demonstrated if we 
interpret the dot product as the length of one vector times the projection of the other 
vector on the first. 


One can use the distributive property of the dot product to show that 
if (a x ,a y ,a z ) and (b x ,b y ,b z ) represent the components of a and b along the 
axes x, y, and z, then 


a • b — a x b x T d y b y T a z b z . 


(1.3) 


From the definition of the dot product, we can draw an important conclu¬ 
sion. If we divide both sides of a • b = |a| |b| cos 9 by |a|, we get 


a • b 


= lb I cos 9 or 


• b = Ibl cos ( 


b = |b| cos 9. 


Noting that |b| cos 9 is simply the projection of b along a, we conclude 


Box 1.1.2. To find the perpendicular projection of a vector b along 
another vector a, take the dot product of b with e a , the unit vector along a. 


Sometimes “component” is used for perpendicular projection. This is not 
entirely correct. For any set of three mutually perpendicular unit vectors in 
space, Box 1.1.2 can be used to find the components of a vector along the 
three unit vectors. Only if the unit vectors are mutually perpendicular do 
components and projections coincide. 


1.1.2 Vector or Cross Product 

Given two space vectors, a and b, we can find a third space vector c, called 
the cross product of a and b, and denoted by c = a x b. The magnitude 
of c is defined by |c| = |a| |bj sin (9 where 9 is the angle between a and b. 
The direction of c is given by the right-hand rule: If a is turned to b (note 
the order in which a and b appear here) through the angle between a and b, 


dot product in 
terms of 
components 


a useful relation to 
be used frequently 
in the sequel 


cross product of 
two space vectors 

right-hand rule 
explained 
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a (right-handed) screw that is perpendicular to a and b will advance in the 
direction of a x b. This definition implies that 


a x b = —b x a. 

This property is described by saying that the cross product is antisymmet¬ 
ric. The definition also implies that 

a ■ (a x b) = b ■ (a x b) = 0 . 

That is, a x b is perpendicular to both a and b . 5 
The vector product has the following properties: 

a x (ab) = (aa) x b = a( a x b), a x b = —b x a, 

ax(b + c)=axb + axc, axa = 0. (1.4) 

Using these properties, we can write the vector product of two vectors in terms 
of their components. We are interested in a more general result valid in other 
coordinate systems as well. So, rather than using x, y, and z as subscripts for 
unit vectors, we use the numbers 1, 2, and 3. In that case, our results can 
also be used for spherical and cylindrical coordinates which we shall discuss 
shortly. 


a x b = (aiei + a 2 e 2 + 0 : 3 ^ 3 ) x (/3iei + (3 2 e 2 + (3 3 e 3 ) 
= aiPiei x ei + ai/3 2 ei x e 2 + cti/^ei x e 3 
+ Oi 2 f3\e 2 x ei + a 2 (3 2 e 2 x e 2 + a 2 (3 3 e 2 x £3 
+ a 3 j3\e 3 x ei + a 3 p 2 e 3 x e 2 + a 3 /3 3 e 3 x 63 . 


But, by the last property of Equation (1.4), we have 


ei x ei = e 2 x e 2 = e 3 x e 3 = 0 . 

Also, if we assume that ei, e 2 , and 63 form a so-called right-handed set, 
i.e., if 


ei x e 2 = —e 2 x ei = e 3 , 

ei x e 3 = — e 3 x ei = -e 2 , (1.5) 

e 2 x e 3 = — e 3 x e 2 = ei, 


then we obtain 


a x b — (a 2 p 3 — a 3 p 2 )ei + (a 3 f3i — c%if3 3 )e 2 + ( a\f3 2 — a 2 (3i)e 3 


5 This fact makes it clear why a x b is not defined in the plane. Although it is possible 
to define a x b for vectors a and b lying in a plane, a x b will not lie in that plane (it 
will be perpendicular to that plane). For the vector product, a and b (although lying in a 
plane) must be considered as space vectors. 
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Figure 1.5: A 3 x 3 determinant is obtained by writing the entries twice as shown, 
multiplying all terms on each slanted line and adding the results. The lines from upper 
left to lower right bear a positive sign, and those from upper right to lower left a negative 
sign. 


which can be nicely written in a determinant form 6 


axb = det 


ei 

e 2 

e 3 \ 


Oil 

Oi2 

a 3 . 

(1.6) 

Pi 

P2 

pJ 



Figure 1.5 explains the rule for “expanding” a determinant. 

Example 1.1.2. From the definition of the vector product and Figure 1.6(a), 
we note that 


| a x b| = area of the parallelogram defined by a and b. 

So we can use Equation (1.6) to find the area of a parallelogram defined by two 
vectors directly in terms of their components. For instance, the area defined by 
a s=§ (1,1, —2) and b = (2, 0, 3) can be found by calculating their vector product 

( ei e 2 e 3 \ 

1 1 —2 I = 3ei — 7e2 — 2e3, 

2 0 3 / 

and then computing its length 

|a x b| = V3 2 + (—7) 2 + (—2) 2 = V62. ■ 




(a) (b) 

Figure 1.6: (a) The area of a parallelogram is the absolute value of the cross product of 
the two vectors describing its sides, (b) The volume of a parallelepiped can be obtained 
by mixing the dot and the cross products. 

6 No knowledge of determinants is necessary at this point. The reader may consider (1.6) 
to be a mnemonic device useful for remembering the components of a X b. 
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Example 1.1.3. The volume of a parallelepiped defined by three non-coplanar 
vectors, a, b, and c, is given by ja ■ (b x c)|. This can be seen from Figure 1.6(b), 
where it is clear that 

volume = (area of base)(altitude) = |b x c|(|a| cos#) = |(b x c) • aj. 

The absolute value is taken to ensure the positivity of the area. In terms of compo¬ 
nents we have 

volume = |(b x c)iai + (b x 0)202 + (b x 0 ) 303 ! 

= K/3273 - /?372)oi + (/3371 - /9i73)o2 + (/?172 - /?27i)o3|, 
which can be written in determinant form as 



/ «i 

O 2 

«3\ 

volume = |a ■ (b x c)| = 

det 1 /3i 

02 

ft 


\7i 

72 

73/ 


Note how we have put the absolute value sign around the determinant of the matrix, 
so that the area comes out positive. ■ 


The concept of vectors as directed line segments that could represent velocities, 
forces, or accelerations has a very long history. Aristotle knew that the effect of two 
forces acting on an object could be described by a single force using what is now 
called the parallelogram law. However, the real development of the concept took an 
unexpected turn in the nineteenth century. 

With the advent of complex numbers and the realization by Gauss, Wessel, and 
especially Argand, that they could be represented by points in a plane, mathemati¬ 
cians discovered that complex numbers could be used to study vectors in a plane. 
A complex number is represented by a pair 7 of real numbers—called the real and 
imaginary parts of the complex number—which could be considered as the two 
components of a planar vector. 

This connection between vectors in a plane and complex numbers was well es¬ 
tablished by 1830. Vectors are, however, useful only if they are treated as objects 
in space. After all, velocities, forces, and accelerations are mostly three-dimensional 
objects. So, the two-dimensional complex numbers had to be generalized to three 
dimensions. This meant inventing ways of adding, subtracting, multiplying, and 
dividing objects such as ( x,y,z ). 

The invention of a spatial analogue of the planar complex numbers is due to 
William R. Hamilton. Next to Newton, Hamilton is the greatest of all English 
mathematicians, and like Newton he was even greater as a physicist than as a 
mathematician. At the age of five Hamilton could read Latin, Greek, and Hebrew. 
At eight he added Italian and French; at ten he could read Arabic and Sanskrit, 
and at fourteen, Persian. A contact with a lightning calculator inspired him to 
study mathematics. In 1822 at the age of seventeen and a year before he entered 
Trinity College in Dublin, he prepared a paper on caustics which was read before the 
Royal Irish Academy in 1824 but not published. Hamilton was advised to rework 
and expand it. In 1827 he submitted to the Academy a revision which initiated the 
science of geometrical optics and introduced new techniques in analytical mechanics. 


7 See Chapter 18. 
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In 1827, while still an undergraduate, he was appointed Professor of Astronomy 
at Trinity College in which capacity he had to manage the astronomical observations 
and teach science. He did not do much of the former, but he was a fine lecturer. 

Hamilton had very good intuition, and knew how to use analogy to reason from 
the known to the unknown. Although he lacked great flashes of insight, he worked 
very hard and very long on special problems to see what generalizations they would 
lead to. He was patient and systematic in working on specific problems and was 
willing to go through detailed and laborious calculations to check or prove a point. 

After mastering and clarifying the concept of complex numbers and their relation 
to planar vectors (see Problem 18.11 for the connection between complex multiplica¬ 
tion on the one hand, and dot and cross products on the other), Hamilton was able 
to think more clearly about the three-dimensional generalization. His efforts led 
unfortunately to frustration because the vectors (a) required four components, and 
(b) defied commutativity! Both features were revolutionary and set the standard 
for algebra. He called these new numbers quaternions. 

In retrospect, one can see that the new three-dimensional complex numbers had 
to contain four components. Each “number,” when acting on a vector, rotates the 
latter about an axis and stretches (or contracts) it. Two angles are required to 
specify the axis of rotation, one angle to specify the amount of rotation, and a 
fourth number to specify the amount of stretch (or contraction). 

Hamilton announced the invention of quaternions in 1843 at a meeting of the 
Royal Irish Academy, and spent the rest of his life developing the subject. 


1.2 Coordinate Systems 

Coordinates are “functions” that specify points of a space. The smallest 
number of these functions necessary to specify a point is called the dimension 
of that space. For instance, a point of a plane is specified by two numbers, and 
as the point moves in the plane the two numbers change, i.e., the coordinates 
are functions of the position of the point. If we designate the point as P, we 
may write the coordinate functions of P as (f(P),g(P)). 8 Each pair of such 
functions is called a coordinate system. 

There are two coordinate systems used for a plane, Cartesian, denoted 
by (x(P),y(P)), and polar, denoted by ( r(P),0(P )). As shown in Figure 1.7, 



Figure 1.7: Cartesian and polar coordinates of a point P in two dimensions. 


8 Think of / (or g) as a rule by which a unique number is assigned to each point P. 


coordinate 
systems as 
functions. 
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the “function” x is defined as giving the distance from P to the vertical axis, 
while 9 is the function which gives the angle that the line OP makes with a 
given fiducial (usually horizontal) line. The origin O and the fiducial line are 
completely arbitrary. Similarly, the functions r and y give distances from the 
origin and to the horizontal axis, respectively. 


Box 1.2.1. In practice, one drops the argument P and writes (x,y) and 
(r, 9). 


We can generalize the above concepts to three dimensions. There are three 
the three common coordinate functions now. So for a point P in space we write 
coordinate 

systems: (f(P),g(P),h(P)), 

Cartesian, 

cylindrical and where /, g , and h are functions on the three-dimensional space. There are 
spherical three widely used coordinate systems, Cartesian (x(P),y(P),z(P)), cylin¬ 

drical (p(P),ip(P),z(P)), and spherical (r(P),9(P),(p(P)). <p(P) is called 
the azimuth or the azimuthal angle of P, while 9(P) is called its polar 
angle. To find the spherical coordinates of P, one chooses an arbitrary point 
as the origin O and an arbitrary line through O called the polar axis. One 
measures OP and calls it r(P); 6(P) is the angle between OP and the polar 
axis. To find the third coordinate, we construct the plane through O and per¬ 
pendicular to the polar axis, drop a projection from P to the plane meeting 
the latter at H , draw an arbitrary fiducial line through O in this plane, and 
measure the angle between this line and OH. This angle is <p(P). Cartesian 
and cylindrical coordinate systems can be described similarly. The three co¬ 
ordinate systems are shown in Figure 1.8. As indicated in the figure, the polar 
axis is usually taken to be the z-axis, and the fiducial line from which y?(P) 
is measured is chosen to be the rr-axis. Although there are other coordinate 
systems, the three mentioned above are by far the most widely used. 





Figure 1.8: (a) Cartesian, (b) cylindrical, and (c) spherical coordinates of a point P in 
three dimensions. 
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Which one of the three systems of coordinates to use in a given physi¬ 
cal problem is dictated mainly by the geometry of that problem. As a rule, 
spherical coordinates are best suited for spheres and spherically symmetric 
problems. Spherical symmetry describes situations in which quantities of in¬ 
terest are functions only of the distance from a fixed point and not on the 
orientation of that distance. Similarly, cylindrical coordinates ease calcula¬ 
tions when cylinders or cylindrical symmetries are involved. Finally, Cartesian 
coordinates are used in rectangular geometries. 

Of the three coordinate systems, Cartesian is the most complete in the 
following sense: A point in space can have only one triplet as its coordinates. 
This property is not shared by the other two systems. For example, a point 
P located on the z-axis of a cylindrical coordinate system does not have a 
well-defined <p(P). In practice, such imperfections are not of dire consequence 
and we shall ignore them. 

Once we have three coordinate systems to work with, we need to know 
how to translate from one to another. First we give the transformation rule 
from spherical to cylindrical. It is clear from Figure 1.9 that 

p = r sin#, ^ C yi = <Ps P h, z = r cos 61 (1.7) 

Thus, given (r, 6 , <p) of a point P, we can obtain (p, ip, z ) of the same point by 
substituting in the RHS. 

Next we give the transformation rule from cylindrical to Cartesian. Again 
Figure 1.9 gives the result: 

x = pcosp, y = psixup, z car = z cy i. (1.8) 


limitations of 
non-Cartesian 
coordinates 


transformation 
from spherical to 
cylindrical 
coordinates 


transformation 
from cylindrical to 
Cartesian 
coordinates 


We can combine (1.7) and (1.8) to connect Cartesian and spherical coordi¬ 
nates: 

x = rsin0cos<p, y = r sin 9 sin tp, z = rcos6. (1.9) 


transformation 
from spherical to 
Cartesian 
coordinates 



Figure 1.9: The relation between the cylindrical and spherical coordinates of a point 
P can be obtained using this diagram. 
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Box 1.2.2. Equations (1.7)-(1.9) are extremely important and worth be¬ 
ing committed to memory. The reader is advised to study Figure 1.9 
carefully and learn to reproduce (1.7)-(1.9) from the figure! 


The transformations given are in their standard form. We can turn them 
around and give the inverse transformations. For instance, squaring the first 
and third equations of (1.7) and adding gives p 2 + z 2 = r 2 or r = \Jp 2 + z 2 . 
Similarly, dividing the first and third equation yields tan# = p/z, which 
implies that 9 = tan ^ x (p/z), or equivalently, 


= cost 


' = cos 


-1 


© 


= cos 


-1 


\J p 1 + z 2 


Thus, the inverse of (1.7) is 
r = \Jp 2 + z 2 , 9 = tan -1 

Similarly, the inverse of (1.8) is 


= cos 


vV 2 + z 2 


p = v© 2 + y 2 , 


ip = tan 


-(f) 


= COS 


-1 


sjx 2 + y 2 


= sin 


y/z 2 + y 2 


ty^sph — ^cyl* 

( 1 . 10 ) 


( 1 . 11 ) 


•Z’cyl — ^car? 

and that of (1.9) is 


r = \/ x 2 + y 2 + z 2 , 


= tan 


= sm 


-l 


\/x 2 +y 2 \ 

— - = cos 


\/x 2 + y 2 + z 2 


s /x 2 + y 2 \ 
\/x 2 + y 2 + z 2 I 


( 1 . 12 ) 


^ = tan- i m =cos- 1 f^^) =sin-*(-=J=Y 
\xJ \ yj ' A /f7! 2 -I- 7/ 2 / 


-\A 2 + y 2 ' 


An important question concerns the range of these quantities. In other 
words: In what range should we allow these quantities to vary in order to cover 
the whole space? For Cartesian coordinates all three variables vary between 
—oo and +oo. Thus, 

—oo < x < +oo, —oo < y < +oo, —oo < z < +oo. 

The ranges of cylindrical coordinates are 

0 < p < oo, 0 < <p < 27t, —oo < 2 < oo. 
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Note that p, being a distance, cannot have negative values. 9 Similarly, the 
ranges of spherical coordinates are 

0 < r < oo, O<0<7r, 0<<p< 27 t. 

Again, r is never negative for similar reasons as above. Also note that the 
range of 6 excludes values larger than n. This is because the range of ip takes 
care of points where 9 “appears” to have been increased by n. 

One of the greatest achievements in the development of mathematics since Euclid 
was the introduction of coordinates. Two men take credit for this development: Fer¬ 
mat and Descartes. These two great French mathematicians were interested in the 
unification of geometry and algebra, which resulted in the creation of a most fruitful 
branch of mathematics now called analytic geometry. Fermat and Descartes who 
were heavily involved in physics, were keenly aware of both the need for quantitative 
methods and the capacity of algebra to deliver that method. 

Fermat’s interest in the unification of geometry and algebra arose because of his 
involvement in optics. His interest in the attainment of maxima and minima— thus 
his contribution to calculus- stemmed from the investigation of the passage of light 
rays through media of different indices of refraction, which resulted in Fermat’s 
principle in optics and the law of refraction. With the introduction of coordinates, 
Fermat was able to quantify the study of optics and set a trend to which all physicists 
of posterity would adhere. It is safe to say that without analytic geometry the 
progress of science, and in particular physics, would have been next to impossible. 

Born into a family of tradespeople, Pierre de Fermat was trained as a lawyer 
and made his living in this profession becoming a councillor of the parliament of 
the city of Toulouse. Although mathematics was but a hobby for him and he could 
devote only spare time to it, he made great contributions to number theory, to 
calculus, and, together with Pascal, initiated work on probability theory. 

The coordinate system introduced by Fermat was not a convenient one. For one 
thing, the coordinate axes were not at right angles to one another. Furthermore, 
the use of negative coordinates was not considered. Nevertheless, he was able to 
translate geometric curves into algebraic equations. 

Rene Descartes was a great philosopher, a founder of modern biology, and a 
superb physicist and mathematician. His interest in mathematics stemmed from his 
desire to understand nature. He wrote: 

... I have resolved to quit only abstract geometry, that is to say, the 
consideration of questions which serve only to exercise the mind, and 
this, in order to study another kind of geometry, which has for its object 
the explanation of the phenomena of nature. 

His father, a relatively wealthy lawyer, sent him to a Jesuit school at the age 
of eight where, due to his delicate health, he was allowed to spend the mornings in 
bed, during which time he worked. He followed this habit during his entire life. At 
twenty he graduated from the University of Poitier as a lawyer and went to Paris 
where he studied mathematics with a Jesuit priest. After one year he decided to 

9 In some calculus books p is allowed to have negative values to account for points on the 
opposite side of the origin. However, in physics literature p is assumed to be positive.To go 
to “the other side” of the origin along p, we change ip by 7r, keeping p positive at all times. 
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join the army of Prince Maurice of Orange in 1617. During the next nine years he 
vacillated between various armies while studying mathematics. 

He eventually returned to Paris, where he devoted his efforts to the study of 
optical instruments motivated by the newly discovered power of the telescope. In 
1628 he moved to Holland to a quieter and freer intellectual environment. There he 
lived for the next twenty years and wrote his famous works. In 1649 Queen Christina 
of Sweden persuaded Descartes to go to Stockholm as her private tutor. However 
the Queen had an uncompromising desire to draw curves and tangents at 5 a.m., 
causing Descartes to break the lifelong habit of getting up at 11 o’clock! After only 
a few months in the cold northern climate, walking to the palace for the 5 o’clock 
appointment with the queen, he died of pneumonia in 1650. 

Descartes described his algebraic approach to geometry in his monumental work 
La Geometrie. It is in this work that he solves geometrical problems using algebra 
by introducing coordinates. These coordinates, as in Fermat’s case, were not lengths 
along perpendicular axes. Nevertheless they paved the way for the later generations 
of scientists such as Newton to build on Descartes’ and Fermat’s ideas and improve 
on them. 

Throughout the seventeenth century, mathematicians used one axis with the y 
values drawn at an oblique or right angle onto that axis. Newton, however, in a book 
called The Method of Fluxions and Infinite Series written in 1671, and translated 
much later into English in 1736, describes a coordinate system in which points are 
located in reference to a fixed point and a fixed line through that point. This was 
the first introduction of essentially the polar coordinates we use today. 


1.3 Vectors in Different Coordinate Systems 

Many physical situations require the study of vectors in different coordinate 
systems. For example, the study of the solar system is best done in spherical 
coordinates because of the nature of the gravitational force. Similarly calcu¬ 
lation of electromagnetic fields in a cylindrical cavity will be easier if we use 
cylindrical coordinates. This requires not only writing functions in terms of 
these coordinate variables, but also expressing vectors in terms of unit vectors 
suitable for these coordinate systems. It turns out that, for the three coordi¬ 
nate systems described above, the most natural construction of such vectors 
renders them mutually perpendicular. 

Any set of three (two) mutually perpendicular unit vectors in space (in the 
plane) is called an orthonormal basis. 10 Basis vectors have the property 
that any vector can be written in terms of them. 

Let us start with the plane in which the coordinate system could be Carte¬ 
sian or polar. In general, we construct an orthonormal basis at a point and 
note that 


1 °The word “orthonormal” comes from orthogonal meaning “perpendicular,” and normal 
meaning “of unit length.” 
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Figure 1.10: The unit vectors in (a) Cartesian coordinates and (b) polar coordinates. 
The unit vectors at P and Q are the same for Cartesian coordinates, but different in 
polar coordinates. 


Box 1.3.1. The orthonormal basis, generally speaking, depends on the 
point at which it is constructed. 


The vectors of a basis are constructed as follows. To find the unit vector 
corresponding to a coordinate at a point P , hold the other coordinate fixed 
and increase the coordinate in question. The initial direction of motion of P 
is the direction of the unit vector sought. Thus, we obtain the Cartesian unit 
vectors at point P of Figure 1.10(a): e x is obtained by holding y fixed and 
letting x vary in the increasing direction; and e y is obtained by holding x fixed 
at P and letting y increase. In each case, the unit vectors show the initial 
direction of the motion of P. It should be clear that one obtains the same set 
of unit vectors regardless of the location of P. However, the reader should 
take note that this is true only for coordinates that are defined in terms of 
axes whose directions are fixed, such as Cartesian coordinates. 

If we use polar coordinates for P, then holding 6 fixed at P gives the 
direction of e r as shown in Figure 1.10(b), because for fixed 9 , that is the 
direction of increase for r. Similarly, if r is fixed at P, the initial direction 
of motion of P when 9 is increased is that of eg shown in the figure. If we 
choose another point such as Q shown in the figure, then a new set of unit 
vectors will be obtained which are different form those of P. This is because 
polar coordinates are not defined in terms of any fixed axes. 

Since {e x ,ej,} and {e r ,ee} form a basis in the plane, any vector a in the 
plane can be expressed in terms of either basis as shown in Figure 1.11. Thus, 
we can write 


a &xp^xp T a yp e yp ctrp^rp T ciQpeQp a X Qe X Q , (1.13) 

where the coordinates are subscripted to emphasize their dependence on the 
points at which the unit vectors are erected. In the case of Cartesian coor¬ 
dinates, this, of course, is not necessary because the unit vectors happen to 
be independent of the point. In the case of polar coordinates, although this 


general rule for 
constructing a 
basis at a point 
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Figure 1.11: (a) The vector a has the same components along unit vectors at P and Q 
in Cartesian coordinates, (b) The vector a has different components along unit vectors 
at different points for a polar coordinate system. 

dependence exists, we normally do not write the points as subscripts, being 
aware of this dependence every time we use polar coordinates. 

So far we have used parentheses to designate the (components of) a vector. 
Since, parentheses—as a universal notation—are used for coordinates of points, 
we shall write components of a vector in angle brackets. So Equation (1.13) 
can also be written as 

a = (a x , a y )p = ( a r , ae)p = (a r , ag)Q , 

where again the subscript indicating the point at which the unit vectors are 
defined is normally deleted. However, we need to keep in mind that although 
(a x ,a y ) is independent of the point in question, (a r ,ag) is very much point- 
dependent. Caution should be exercised when using this notation as to the 
location of the unit vectors. 

The unit vectors in the coordinate systems of space are defined the same 
way. We follow the rule given before: 


Box 1.3.2. (Rule for Finding Coordinate Unit Vectors ). To find 
the unit vector corresponding to a coordinate at a point P, hold the other 
coordinates fixed and increase the coordinate in question. The initial di¬ 
rection of motion of P is the direction of the unit vector sought. 


It should be clear that the Cartesian basis {e^e^e^} is the same for all 
points, and usually they are drawn at the origin along the three axes. An 
arbitrary vector a can be written as 

a = a x e x + a y e y + a z e z or a = (a x , a y , a z ), (1.14) 

where we used angle brackets to denote components of the vector, reserving 
the parentheses for coordinates of points in space. 
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Figure 1.12: Unit vectors of cylindrical coordinates. 


The unit vectors at a point P in the other coordinate systems are obtained 
similarly. In cylindrical coordinates, e p lies along and points in the direction 
of increasing p at P; e v is perpendicular to the plane formed by P and the 
2 -axis and points in the direction of increasing yj; e z points in the direction of 
positive z (see Figure 1.12). We note that only e z is independent of the point 
at which the unit vectors are defined because 2 is a fixed axis in cylindrical 
coordinates. Given any vector a, we can write it as 

a — dpGp a z e z or a — (n p , (1.15) 

The unit vectors in spherical coordinates are defined similarly: e r is taken 
along r and points in the direction of increasing r; this direction is called 
radial; eg is taken to lie in the plane formed by P and the 2 -axis, is per¬ 
pendicular to r, and points in the direction of increasing 9; e v is as in the 
cylindrical case (Figure 1.13). An arbitrary vector in space can be expressed 
in terms of the spherical unit vectors at P: 

a = a r e r + ageg + a v e v or a = (a r , ag, a v ). (1-16) 

It should be emphasized that 


Box 1.3.3. The cylindrical and spherical unit vectors e p , e r , eg, and e p 
are dependent on the position of P. 


Once an origin O is designated, every point P in space will define a vector, 
called a position vector and denoted by r. This is simply the vector drawn 
from O to P. In Cartesian coordinates this vector has components (x,y,z ), 
thus one can write 

r = xe x + ye y + ze z . (1-17) 


radial direction 


position vector 
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Figure 1.13: Unit vectors of spherical coordinates. Note that the intersection of the 
shaded plane with the xy- plane is a line along the cylindrical coordinate p. 

But (x, y 1 z) are also the coordinates of the point P. This can be a source of 
confusion when other coordinate systems are used. For example, in spherical 
coordinates, the components of the vector r at P are (r, 0,0) because r has 
only a component along e r and none along eg or e^. One writes 11 


(1.18) 


r = re r . 


However, the coordinates of P are still (r, 0, ip)\ Similarly, the coordinates of 
P are (p, ip, z) in a cylindrical system, while 


(1.19) 



because r lies in the pz-plane and has no component along e v . Therefore, 


Box 1.3.4. Make a clear distinction between the components of the 
vector r and the coordinates of the point P. 


A common symptom of confusing components with coordinates is as fol¬ 
lows. Point P\ has position vector ri with spherical components (ri,0,0) 
at Pi. The position vector of a second point P 2 is r -2 with spherical compo¬ 
nents (r2,0,0) at P 2 . It is easy to fall into the trap of thinking that ri — r 2 
has spherical components (n — r 2 ,0, 0)! This is, of course, not true, because 
the spherical unit vectors at Pi are completely different from those at P 2 , 
and, therefore, contrary to the Cartesian case, we cannot simply subtract 
components. 

11 We should really label everything with P. But, as usual, we assume this labeling to be 
implied. 
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One of the great advantages of vectors is their ability to express results 
independent of any specific coordinate systems. Physical laws are always 
coordinate-independent. For example, when we write F = ma both F and a 
could be expressed in terms of Cartesian, spherical, cylindrical, or any other 
convenient coordinate system. This independence allows us the freedom to 
choose the coordinate systems most convenient for the problem at hand. For 
example, it is extremely difficult to solve the planetary motions in Cartesian 
coordinates, while the use of spherical coordinates facilitates the solution of 
the problem tremendously. 

Example 1.3.1. We can express the coordinates of the center of mass (CM) of 
a collection of particles in terms of their position vectors . 12 Thus, if r denotes the 
position vector of the CM of the collection of N mass points, mi, m 2 , ■ ■ ■, mN with 
respective position vectors ri, r 2 ,..., rjv relative to an origin O, then 13 

_ _ min + m 2 r 2 H- + m. N r N _ J2k= 1 m fc r fc 2f) , 

mi + m 2 + • • • + mN M 

where M = Ylk=i m k is the total mass of the system. One can also think of Equation 
(1.20) as a vector equation. To find the component equations in a coordinate system, 
one needs to pick a fixed point (say the origin), a set of unit vectors at that point 
(usually the unit vectors along the axes of some coordinate system), and substitute 
the components of r k along those unit vectors to find the components of r along the 
unit vectors. B 


1.3.1 Fields and Potentials 

The distributive property of the dot product and the fact that the unit vectors 
of the bases in all coordinate systems are mutually perpendicular can be used 
to derive the following: 

a • b = a x b x + a y b y + a z b z (Cartesian), 

a • b = a,pbp + a v b v + a z b z (cylindrical), (1.21) 

a • b = a r b r + agbg + a v b v (spherical). 

The first of these equations is the same as (1.3 ). 

It is important to keep in mind that the components are to be expressed 
in the same set of unit vectors. This typically means setting up mutually per¬ 
pendicular unit vectors (an orthonormal basis) at a single point and resolving 
all vectors along those unit vectors. 

The dot product, in various forms and guises, has many applications in 
physics. As pointed out earlier, it was introduced in the definition of work, 
but soon spread to many other concepts of physics. One of the simplest—and 
most important—applications is its use in writing the laws of physics in a 
coordinate-independent way. 

12 This implies that the equation is most useful only when Cartesian coordinates are 
used, because only for these coordinates do the components of the position vector of a 
point coincide with the coordinates of that point. 

13 We assume that the reader is familiar with the symbol simply as a summation 
symbol. We shall discuss its properties and ways of manipulating it in Chapter 9. 
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Figure 1.14: The diagram illustrating the electrical force when one charge is at the 
origin. 


Coulomb’s law 


Example 1.3.2. A point charge q is situated at the origin. A second charge q' is 
located at ( x,y,z ) as shown in Figure 1.14. We want to express the electric force 
on q' in Cartesian, spherical, and cylindrical coordinate systems. 

We know that the electric force, as given by Coulomb’s law, lies along the line 
joining the two charges and is either attractive or repulsive according to the signs 
of q and q' . All of this information can be summarized in the formula 


F 


i 1 


k e qq' „ 

9 

r z 


( 1 . 22 ) 


where k e = l/(47reo) ~ 9 x 10 9 in SI units. Note that if q and q' are unlike, qq' < 0 
and F 9 / is opposite to e r , i.e., it is attractive. On the other hand, if q and q' are of 
the same sign, qq' > 0 and F g / is in the same direction as e r , i.e., repulsive. 

Equation (1.22) expresses F ? / in spherical coordinates. Thus, its components in 
terms of unit vectors at q' are (keqq 1 /r 2 , 0, 0). To get the components in the other 
coordinate systems, we rewrite (1.22). Noting that e r = r/r, we write 


F 


«' — 


k e qq' r 
r 2 r 



(1.23) 


For Cartesian coordinates we use (1.12) to obtain r 3 = (x 2 +y 2 +z 2 ) 3 ^ 2 . Substituting 
this and (1.17) in (1.23) yields 


= k e qq' 
q ' (x 2 + y 2 + z 2 ) 3 / 2 


(xe x + ye y + ze z ). 


Therefore, the components of F 9 / in Cartesian coordinates are 
k e qq’x k e qq'y k e qq'z 


(x 2 +y 2 + z 2 ) 3 / 2 ’ (x 2 + y 2 + z 2 ) 3 ' 2 ’ (x 2 + y 2 + z 2 ) 3 / 2 


Finally, using (1.10) and (1.19) in (1.23), we obtain 

k e qq' 


Fq ' (p 2 + z 2 ) 3 / 2 


(pe p + ze z ). 
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Figure 1.15: The displacement vector between Pi and P 2 is the difference between 
their position vectors. 


Thus the components of F 9 / along the cylindrical unit vectors constructed at the 
location of q' are 

/ k e qq'p k e qq'z \ 

\ (p 2 + 2 2 ) 3 / 2 ’ ’ (p 2 + 2 2 ) 3 / 2 / ' ■ 

Since r gives the position of a point in space, one can use it to write 
the distance between two points Pi and P 2 with position vectors ri and r 2 . 
Figure 1.15 shows that r 2 — ri is the displacement vector from Pi to P 2 . The 
importance of this vector stems from the fact that many physical quantities 
are functions of distances between point particles, and r 2 — ri is a concise way 
of expressing this distance. The following example illustrates this. 


During the second half of the eighteenth century many physicists were engaged in a 
quantitative study of electricity and magnetism. Charles Augustin de Coulomb, 
who developed the so-called torsion balance for measuring weak forces, is credited 
with the discovery of the law governing the force between electrical charges. 

Coulomb was an army engineer in the West Indies. After spending nine years 
there, due to his poor health, he returned to France about the same time that the 
French Revolution began, at which time he retired to the country to do scientific 
research. 

Beside his experiments on electricity, Coulomb worked on applied mechanics, 
structural analysis, the fracture of beams and columns, the thrust of arches, and the 
thrust of the soil. 

At about the same time that Coulomb discovered the law of electricity, there 
lived in England a very reclusive character named Henry Cavendish. He was 
born into the nobility, had no close friends, was afraid of women, and disinterested 
in music or arts of any kind. His life revolved around experiments in physics and 
chemistry that he carried out in a private laboratory located in his large mansion. 

During his long life he published only a handful of relatively unimportant pa¬ 
pers. But after his death about one million pounds sterling were found in his bank 
account and twenty bundles of notes in his laboratory. These notes remained in 
the possession of his relatives for a long time, but when they were published one 
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hundred years later, it became clear that Henry Cavendish was one of the greatest 
experimental physicists ever. He discovered all the laws of electric and magnetic 
interactions at the same time as Coulomb, and his work in chemistry matches that 
of Lavoisier. Furthermore, he used a torsion balance to measure the universal grav¬ 
itational constant for the first time, and as a result was able to arrive at the exact 
mass of the Earth. 


Example 1.3.3. Coulomb’s law for two arbitrary charges 

Suppose there are point charges qi at Pi and q 2 at P 2 . Let us write the force exerted 

on q 2 by q\. The magnitude of the force is 

_ k e qiq 2 


where d = P 1 P 2 is the distance between the two charges. We use d because the 
usual notation r has special meaning for us: it is one of the coordinates in spherical 
systems. If we multiply this magnitude by the unit vector describing the direction 
of the force, we obtain the full force vector (see Box 1.1.1). But, assuming repulsion 
for the moment, this unit vector is 


r 2 ^ n 

jr 2 - it | 


Also, since d = |r 2 — rr|, we have 


Coulomb’s law 
when charges are 
arbitrarily located 


F 2 i 


k e qiq2 

d? 


e 2 i 


fc e <?i<? 2 r 2 - n 
|r 2 - n| 2 |r 2 - rr| 


or 

F 21 = kM (r 2 - ri ). (1.24) 

r 2 — it F 

Although we assumed repulsion, we see that (1.24) includes attraction as well. In¬ 
deed, if qiq 2 < 0, F 2 i is opposite to r 2 — it, i.e., F 2 i is directed from P 2 to Pi. Since 
F 2 i is the force on g 2 by q\, this is an attraction. We also note that Newton’s third 
law is included in (1.24): 


F12 


fc e <? 2 gi 
|ri - r 2 | 3 


(n - r 2 ) = -F 21 


vector form of 
gravitational force 


because r 2 — it = — (it — r 2 ) and |r 2 — n| = |ri — r 2 |. 
We can also write the gravitational force immediately 


F 2 i = — 


Gmim 2 
|r 2 - it | 3 


(r 2 


ri 


(1.25) 


where mi and m 2 are point masses and the minus sign is introduced to ensure 
attraction. ■ 


Now that we have expressions for electric and gravitational forces, we can 
obtain the electric field of a point charge and the gravitational field of a point 
mass. First recall that the electric field at a point P is defined to be the 
force on a test charge q located at P divided by q. Thus if we have a charge 
gi, at Pi with position vector ri and we are interested in its fields at P with 
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position vector r, we introduce a charge q at r and calculate the force on q 
from Equation (1.24): 

k e qiq , , 


F — 

r !“ 


Dividing by q gives 


Ei = 


l r r r | 

k e qi 


■(r-rr), 


(1.26) 


|r - ri 

where we have given the field the same index as the charge producing it. 
The calculation of the gravitational field follows similarly. The result is 

Gui\ 


gi = 


r-rr 


f(r-rr). 


(1.27) 


In (1.26) and (1.27), P is called the field point and P\ the source point. 
Note that in both expressions, the field position vector comes first. 

If there are several point charges (or masses) producing an electric (gravita¬ 
tional) field, we simply add the contributions from each source. The principle 
behind this procedure is called the superposition principle. It is a princi¬ 
ple that “seems” intuitively obvious, but upon further reflection its validity 
becomes surprising. Suppose a charge q\ produces a field Ei around itself. 
Now we introduce a second charge g 2 which, far away and isolated from any 
other charges, produced a field E 2 around itself. It is not at all obvious that 
once we move these charges together, the individual fields should not change. 
After all, this is not what happens to human beings! We act completely dif¬ 
ferently when we are alone than when we are in the company of others. The 
presence of others drastically changes our individual behaviors. Nevertheless, 
charges and masses, unfettered by any social chains, retain their individuality 
and produce fields as if no other charges were present. 

It is important to keep in mind that the superposition principle applies 
only to point sources. For example, a charged conducting sphere will not 
produce the same field when another charge is introduced nearby, because the 
presence of the new charge alters the charge distribution of the sphere and 
indeed does change the sphere’s field. However each individual point charge 
(electron) on the sphere, whatever location on the sphere it happens to end 
up in, will retain its individual electric field. 14 

Going back to the electric field, we can write 


E = Ei + E 2 -I-1- E„ 


for n point charges qi,q 2 , ■ ■ ■ ,q n (see Figure 1.16). Substituting from (1.26), 
with appropriate indices, we obtain 


E = 


k e qi 

l r r 1 1 


r(r-n) + 


fc e g2 

|r-r 2 | 


r(r-r 2 ) + ••• + 


k e q n 


r - r„ 




or, using the summation symbol, we obtain 

14 The superposition principle, which in the case of electrostatics and gravity is needed 
to calculate the fields of large sources consisting of many point sources, becomes a vital 
pillar upon which quantum theory is built and by which many of the strange phenomena 
of quantum physics are explained. 
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Figure 1.16: The electrostatic field of N point charges is the sum of the electric fields 
of the individual charges. 


Box 1.3.5. The electric field of n point charges qi, q 2 , ■ . ■, q n , lo¬ 
cated at position vectors ri,r 2 ,... , r„ is E = E"=i \r-r *| a ( r — r i); an ^ 
the analogous expression for the gravitational field of n point masses 
TOi,m 2 ,...,TO n is g= -E"=i | r -^|3 ( r ~ r 0- 


The concept of force has a fascinating history which started in the works of Galileo 
around the beginning of the seventeenth century, mathematically formulated and 
precisely defined by Sir Isaac Newton in the second half of the seventeenth century, 
revised and redefined in the form of fields by Michael Faraday and James Maxwell 
in the mid nineteenth century, and finally brought to its modern quantum field 
theoretical form by Dirac, Heisenberg, Feynman, Schwinger, and others by the mid 
twentieth century. 

Newton, in his theory of gravity, thought of gravitational force as “action- 
at-a-distance,” an agent which affects something that is “there” because of the 
influence of something that is “here.” This kind of interpretation of force had both 
philosophical and physical drawbacks. It is hard to accept a ghostlike influence 
on a distant object. Is there an agent that “carries” this influence? What is this 
agent, if any? Does the influence travel infinitely fast? If we remove the Sun from 
the Solar System would the Earth and other planets “feel” the absence of the Sun 
immediately? 

These questions, plus others, prompted physicists to come up with the idea of a 
field. According to this interpretation, the Sun, by its mere presence, creates around 
itself an invisible three dimensional “sheet” such that, if any object is placed in this 
sheet, it feels the gravitational force. The reason that planets feel the force of gravity 
of the Sun is because they happen to be located in the gravitational field of the Sun. 
The reason that an apple falls to the Earth is because it is in the gravitational field 
of the Earth and not due to some kind of action-at-a-distance ghost. 
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Therefore, according to this concept, the force acts on an object here, because 
there exists a field right here. And force becomes a local concept. The field con¬ 
cept removes the difficulties associated with action-at-a-distance. The “agent” that 
transmits the influence from the source to the object, is the field. If the Sun is stolen 
from the solar system, the Earth will not feel the absence of the Sun immediately. 
It will receive the information of such cosmic burglary after a certain time-lapse 
corresponding to the time required for the disturbance to travel from the Sun to the 
Earth. We can liken such a disturbance (disappearance of the Sun) to a disturbance 
in the smooth water of a quiet pond (by dropping a stone into it). Clearly, the dis¬ 
turbance travels from the source (where the stone was dropped) to any other point 
with a finite speed, the speed of the water waves. 

The concept of a field was actually introduced first in the context of electricity 
and magnetism by Michael Faraday as a means of “visualizing” electromagnetic 
effects to replace certain mathematical ideas for which he had little talent. However, 
in the hands of James Maxwell, fields were molded into a physical entity having an 
existence of their own in the form of electromagnetic waves to be produced in 1887 
by Hertz and used in 1901 by Marconi in the development of radio. 


A concept related to that of fields is potential which is closely tied to the 
work done by the fields on a charge (in the case of electrostatics) or a mass 
(in the case of gravity). It can be shown 15 that the gravitational potential 
<l>(r) at r, of n point masses, is given by 


(L28) 

and that of n point charges by 

n b 
i=1 1 1 

Note that in both cases, the potential goes to zero as r goes to infinity. This 
has to do with the choice of the location of the zero of potential, which we 
have chosen to be the point at infinity in Equations (1.28) and (1.29). 

Example 1.3.4. The electric charges 31 , 92 , 93 , and 94 are located at Cartesian 
(a, 0, 0), (0, a, 0), (—a, 0, 0), and (0, —a, 0), respectively. We want to find the electric 
field and the electrostatic potential at an arbitrary point on the 2 -axis. We note 
that 

n = ae x , r 2 = ae y , r 3 = — ae x , r 4 = -ae B , r = ze z , 

so that 


r — ri = —ae x + ze z , r — r 2 = — ae y + ze z , 

r — r 3 = ae x + ze z , r — r 4 = ae y + ze z , 


15 See Chapter 14 for details. 


potential 
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and |r — r-i| 3 = (a 2 + z 2 ) 3 ^ 2 for all i. The electric field can now be calculated using 
Box 1.3.5: 


E = 


k e qi 


(a 2 + z 2 ) 3 / 2 
k e q 3 


+ 


(—ae x + ze z ) + 


;(ae x + ze z ) + 


k e q 2 


(a 2 + z 2 ) 3 / 2 
k e q4 

(a 2 + z 2 ) 3 ' 2 


{—ae y + ze z ) 
(ae y + ze z ) 


(a 2 + z 2 ) 3 / 2 
k 

[{-aqi + aq 3 )e x + (~aq 2 + aqi)e y + (qi + q 2 + q 3 + q 4 ,)ze z ]. 


(a 2 + z 2 ) 3 / 2 


It is interesting to note that if the sum of all charges is zero, the 2 -component of 
the electric field vanishes at all points on the 2 -axis. Furthermore, if, in addition, 
qi = q 3 and q 2 = q 4 , there will be no electric held at any point on the 2 -axis. 

The potential is obtained similarly: 

, k e qi k e q 2 k e q 3 k e qi 

( a 2 + 2 2 ) 1 / 2 ( a 2 + 2 2 ) 1 / 2 ( a 2 + z 2 ) 1 / 2 ( a 2 + z 2 ) 1 / 2 

_ k e (qi + g 2 + g 3 + q 4 ) 

y/a 2 + z 2 

So, the potential is zero at all points of the 2 -axis, as long as the total charge 
is zero. B 


1.3.2 Cross Product 


angular 

momentum and 
torque as 
examples of cross 
products 


The unit vectors in the three coordinate systems are not only mutually perpen¬ 
dicular, but in the order in which they are given, they also form a right-handed 
set [see Equation (1.5)]. Therefore, we can use Equation (1.6) and write 

( O., Gy [ A P .\ / 

a x a y a z \ = det \a p a v a z = det a r ag a v 

b x by b z J \bp b z J \b r bg b p J 

s. v ^ ^ v ✓ s. v ^ 

in Cartesian CS in cylindrical CS in spherical CS 

(1.30) 

Two important prototypes of the concept of cross product are angular 
momentum and torque. A particle moving with instantaneous linear mo¬ 
mentum p relative to an origin O has instantaneous angular momentum 
L = r x p if its instantaneous position vector with respect to O is r. In 
Figure 1.17 we have shown r, p, and r x p. Similarly, if the instantaneous 
force on the above particle is F, then the instantaneous torque acting on it is 
T = r x F. 

If there are more than one particle we simply add the contribution of 
individual particles. Thus, the total angular momentum L of N particles and 
the total torque T acting on them are 

N N 

L = ^2 r k X Pfc and T = ^r t xF t , (1.31) 

fc=l k =1 

where r;- is the position of the kth particle, p fc its instantaneous momentum, 
and Ffc the instantaneous force acting on it. 
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Figure 1.17: Angular momentum of a moving particle with respect to the origin O. 
The circle with a dot in its middle represents a vector pointing out of the page. It is 
assumed that r and p lie in the page. 


Example 1.3.5. In this example, we show that the torque on a collection of three 
particles is caused by external forces only. The torques due to the internal forces 
add up to zero. The generalization to an arbitrary number of particles will be done 
in Example 9.2.1 when we learn how to manipulate summation symbols. 

For N = 3, the second formula in Equation (1.31) reduces to 


T = n x Fi + r 2 x F 2 + r 3 x F 3 . 

Each force can be divided into an external part and an internal part, the latter being 
the force caused by the presence of the other particles. So, we have 

Fi =Fi ext) +Fi 2 +Fi 3 , 

F 2 =F(, ext) +F 2i +F 23 , 

F 3 =F^ ext) +F 3 i +F 32 , 

where F 32 is the force on particle 1 exerted by particle 2, etc. Substituting in the 
above expression for the torque, we get 

T = n x F^ ext) + r 2 x F^ ext) + r 3 x F^ ext) 

+ n x Fi 2 + n x Fi 3 + r 2 x F 2 i + r 2 x F 23 + r 3 x F 3i + r 3 x F 32 

_ T (ext) + £ ri _ r2 ) x p 12 + ( ri _ r3 ) x Fl3 + ( r2 _ r3 ) x p 23; 

where we used the third law of motion: Fi 2 = —F 2 i, etc. Now we note that the 
internal force between two particles, 1 and 2 say, is along the line joining them, i.e., 
along ri — r 2 . It follows that all the cross products in the last line of the equation 
above vanish and we get T = g 

We have already seen that multiplying a vector by a number gives another 
vector. A physical example of this is electric force which is obtained by multi¬ 
plying electric field by electric charge. In fact we divided the electric force by 
charge to get the electric field. Historically, it was the law of the force which 
was discovered first and then the concept of electric field was defined. We have 
also seen that one can get a new vector by cross-multiplying two vectors. The 
rule of this kind of multiplication is, however, more complicated. It turns out 


from electric field 
to electric force 
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magnetic field of a 
moving charge or 
Biot-Savart law 


magnetic force on 
a moving charge. 


that the magnetic force is related to the magnetic field via such a cross multi¬ 
plication. What is worse is that the magnetic field is also related to its source 
(electric charges in motion) via such a product. Little wonder that magnetic 
phenomena are mathematically so much more complicated than their electric 
counterparts. That is why in the study of magnetism, one first introduces 
the concept of magnetic field and how it is related to the motion of charges 
producing it, and then the force of this field on moving charges. 


Example 1.3.6. A charge q , located instantaneously at the origin, is moving 

with velocity v relative to P [see Figure 1.18(a)]. Assuming that |v| is much smaller 

than the speed of light, the instantaneous magnetic field at P due to q is given by 

km,qvxe r . . k m qv x r ... 

B = - 5 -, or, using e r = r/r, by B = ---. this is a simple version of 

a more general formula known as the Biot—Savart law. In the above relations, k m 

is the analog of k e in the electric case. 

If we are interested in the magnetic field when q is located at a point other 
than the origin, we replace r with the vector from the instantaneous location of the 
moving charge to P. This is shown in Figure 1.18(b), where the vector from q\ to 
P is to replace r in the above equation. More specifically, we have 


„ kmqiVi x (r - n) 

Bi = ——• 

If there are N charges, the total magnetic field will be 


B = £ 


L®vt x (r - r fc ) 


T- r fc 


where we have used the superposition principle. 


(1.32) 


(1.33) 


When a charge q moves with velocity v in a magnetic field B, it experiences 
a force given by 

F = qv x B. (1.34) 

It is instructive to write the magnetic force exerted by a charge q± moving 
with velocity Vi on a second charge <72 moving with velocity V 2 . We leave this 
as an exercise for the reader. 




Figure 1.18: The (instantaneous) magnetic field at P of a moving point charge (a) 
when P is at the origin, and (b) when P is different from the origin. The field points 
out of the page for the configuration shown. 
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Example 1.3.7. A charge q moves with constant speed v (assumed to be small 
compared to the speed of light) on a straight line. We want to calculate the magnetic 
field produced by the charge at a point P located at a distance p from the line as a 
function of time. Cylindrical coordinates are most suitable for this problem because 
of the existence of a natural axis. Choose the path of the charge to be the 2 -axis. 
Also assume that P lies in the xy- plane, and that q was at the origin at t = 0. Then 
v = ve z , r = pe p , ri = vte z , r — ri = pe p — vte z . So 

|r — n | = \J (pe p - vte z ) • ( pe p - vte z ) = i/p 2 + v 2 t 2 
and v x (r — ri) = ve z x ( pe p — vte z ) = pve v . Therefore, the magnetic field is 
k m qv x (r - n) k m qpv 

|r-n| 3 “ (p 2 PvH 2 ) 3 / 2 ^' 

Readers familiar with the relation between magnetic fields and currents in long wires 
will note that the magnetic field above obeys the right-hand rule. ® 


1.4 Relations Among Unit Vectors 

We have seen that, depending on the nature of problems encountered in 
physics, one coordinate system may be more useful than others. We have 
also seen that the coordinates can be transformed back and forth using func¬ 
tional relations that connect them. Since many physical quantities are vectors, 
transformation and expression of components in bases of various coordinate 
systems also become important. The key to this transformation is writing one 
set of unit vectors in terms of others. In the derivation of these relations, we 
shall make heavy use of Box 1.1.2. 

First we write the cylindrical unit vectors in terms of Cartesian unit vec¬ 
tors. Since {e x , e y . e z } form a basis, any vector can be written in terms of 
them. In particular, e p can be expressed as 

6p = ai&x + b\e y + Cie, (1.35) 

with ai, bi, and Ci to be determined. Next we recall that 


Box 1.4.1. The dot product of two unit vectors is the cosine of the angle 
between them. 


Furthermore, Figure 1.12 shows that the angle between e p and e x is and 
that between e p and e y is tt/2 — c p. So, by dotting both sides of Equation 
(1.35) by e Xl e yi and e z in succession, we obtain 



di T 0 T 0 — Ui 


=COS tp 



= 0 + 6i + 0 = 


h 


=sin ip 


a\ = COS (/?, 


bi = sin(£, 



0 -|- 0 H - Ci — C\ =t > Ci — 0. 


=0 
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Therefore, 

e p = b x cos tp + e y sin tp. 

With the first and third cylindrical unit vectors e p and e z at our disposal , 16 
we can determine the second, using Equation (1.5): 


cylindrical unit 
vectors in terms of 
Cartesian unit 
vectors 


( G X Gy 

0 0 

cos <p sin tp 

Thus, 

Op = e x cos tp + e y sin tp, 

e v = —e x sin tp + e y cos tp, (1.36) 

e z = e z . 



sin tp + e y cos tp. 


This equation can easily be inverted to find the Cartesian unit vectors in 
terms of the cylindrical unit vectors. For example, the coefficients in 


e x — 02 Gp + &20ip + C2& z 

can be obtained by dotting both sides of it with e p , g p , and e z , respectively, 


bp ■ b x = 0,2 + 0 + 0 
• b x = 0 + 62 + 0 
b z • g x = 0 + 0 + C2 


cos p = CI2 , 

— sin tp = , 

0 = c 2 , 


where we have used e p • g x = cos tp, and e v • g x = — sin tp — obtained by 
dotting the first and second equations of (1.36) with g x —as well as g z -g x = 0 . 
Similarly, one can obtain g v in terms of the cylindrical unit vectors. The entire 
result is 


Cartesian unit 
vectors in terms of 
cylindrical unit 
vectors 

Now we express the spherical unit vectors in terms of the cylindrical ones. 

This is easily done for e r , because it has only e p and e 2 components (why?). 

Thus, with 

G r = Cl3Gp + b^G z , 

we obtain 


b x = g p cos tp — g p sin tp 

G y = g p sin tp + g v cos tp (1-37) 

b z = g z 


Gp ■ G r = 03 + 0 =f> <23 = sin 6 , 
g z ■ G r = 0 + 63 63 = cos 9, 


16 Remember that e 2 is a unit vector in both coordinate systems. So, one can say that 
the cylindrical e z has components (0,0,1) in the Cartesian basis {e., : . c v . c ; , [. 
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where in the last step of each line, we used the fact that the angle between e r 
and e z is 9 and that between e r and e p is ir/2 — 9 (see Figure 1.13). With 03 
and 63 so determined, we can write 


e r = e p sin 9 + e z cos 9. 

Having two spherical unit vectors e r and e v at our disposal , 17 we can 
determine the third one, using (1.5) and (1.30): 


Thus, 


e p 

&ip 

e z \ 

0 

1 

0 = e p cos 9 — e z sin 9 

sin 9 

0 

cos 9 J 

= e p 

sin 9 

+ e z cos 9, 


eg = e p cos 9 — e z sin 6, 


(1.38) 


The inverse relations can be obtained as before. We leave the details of 
the calculation as an exercise for the reader. 

Combining Equations (1.36) and (1.38), we can express spherical unit vec¬ 
tors in terms of the Cartesian unit vectors: 

e r = e x sin 9 cos p + e y sin 9 sin p + e z cos 9 , 

eg = e x cos 9 cos p + e y cos 9 sin p — e z sin 9, (1.39) 

e v = — e x sin p + e y cos p. 

Equations (1.39) and (1.36) are very useful when calculating vector quan¬ 
tities in spherical and cylindrical coordinates as we shall see in many examples 
to follow. These equations also allow us to express a unit vector in one of the 
three coordinate systems in terms of the unit vectors of any other coordinate 
system. 

Example 1.4.1. Pi and P 2 have Cartesian coordinates (1,1,1) and (— 1,2,—1), 
respectively. A vector a has spherical components (0, 2, 0) at Pi. We want to find 

the spherical components of a at P 2 . These are given by a • e r2 , a ■ eg 2 , and a • e V2 . 

In order to calculate these dot products, it is most convenient to express all vectors 
in Cartesian form. So, using Equation (1.39), we have 

a = 2ee 1 = 2 ( e x cos 9 1 cos pi + e y cos 9 1 sin pi — e 2 sin 9 1 ), 

where (ri,9i,pi) are coordinates of Pi. We can calculate these from the Cartesian 
coordinates of Pi: 

ri = y/l 2 + l 2 + l 2 = V3, cos 9 1 = —-fc= —, tan pi = — = 1 =*- pi = —. 

ri v3 xi 4 

11 Recall that t. is both a cylindrical and a spherical unit vector. 


spherical unit 
vectors in terms of 
cylindrical unit 
vectors 


spherical unit 
vectors in terms of 
Cartesian unit 
vectors 
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Therefore, 


a ~ 2l °*V3V2 +&y V3V2 


— /— + 


VE VE 


VE 


Now we need to express e r2 , eg 2 , and e V2 in terms of Cartesian unit vectors. 
Once again we use Equation (1.39) for which we need the spherical coordinates of 

P 2 : 

1 


r 2 = V (—l) 2 + 2 2 + (—l) 2 = VE, cos 02 =■ — =-^=, tanip 2 = — = — 2. 

r 2 V6 *2 

Similarly, Equations (1.11) and (1.12) yield 


Then 


sin 0 2 = +\ -i 
b 


1 . 2 
COS (£2 = - ■=, Sin £ 2 = H —. 

VE VE 


e r2 = e* sin 0 2 cos £2 + e y sin 0 2 sin £2 + e z cos 0 2 


+ e 


6 V VEJ 

e v =e 


5 2 .1 

- e. 


1 . 2 

— /7r e * + 

Vo VO 


1 

7T 


e@ 2 = e* cos 0 2 cos (p 2 + cos 0 2 sin <p 2 — e z sin 0 2 




1 

'7! 


+ Gy ( 


VE) VE 


/-Gx -Gy -6.2 

V30 ^30 ^30 

— e x sin £2 + e v cos £2 = -- 


1 


VT* \/5 

We can now take the dot products required for the components: 


2 „ 2 „ 4 

y/E y/E y/E 


2 4 4 _ 

_ 6 + 6 + 6 ’ 


1 , 2 „ 

- F e z + —c 

a/6 y/E 


VE 


„ „ / 2 „ 2 , 4 „ 

0 comp = a • eg 2 = + ~^ e v ~ 

2 4 20 _ _3_ 

~ 6a/5 6^5 + 6y/E ~ y/E* 

( 2 „ 2 „ 4 * 

V? comp = a • e V2 = + -j=e y - -j=e, 


4 2 _ 6 

730 730 ~ 730 

It now follows that 


_3_ 

VE' 


y/30 X V30 


-F e *-F { 

y/E y/E 


a/30 
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As a check, we note that 



which agrees with the length of a. ® 

Example 1.4.2. Points P\ and P 2 have spherical coordinates (ri,0i,pi) and 
{r 2 , 62 ,^ 2 ), respectively. We want to find: (a) the angle between their position 
vectors ri and v 2 in terms of their coordinates; (b) the spherical components of r 2 
at Pi; and (c) the spherical components of ri at P 2 . Once again, we shall express 
all vectors in terms of Cartesian unit vectors when evaluating dot products. 

(a) The cosine of the angle—call it 712- between the position vectors is simply 
e ri ■ e r2 . We can readily find this by using Equation (1.39): 

cos 712 = e ri • e r2 = (e x sin#i cos<pi + e y sin^i sirnpi + e z cos#i) 

• (e x sin 62 cos p 2 + e y sin 9 2 sin p 2 + e z cos 62 ) 

= sin 6 \ cos pi sin 9 2 cos p 2 + sin 81 sin pi sin 9 2 sin £2 + cos 9 1 cos 9 2 
= sin 9 1 sin 9 2 (cos pi cos p 2 + sin ipi sin p 2 ) + cos 9\ cos 9 2 
= sin 9 1 sin 9 2 cos(^5i — ip 2 ) + cos 9\ cos 82 - 

(b) To find the spherical components of r 2 at Pi, we need to take the dot product 
of r 2 with the spherical unit vectors at P\: 


r comp = r 2 • e ri = r 2 e r2 ■ e ri 

= r 2 [sin 9 1 sin 9 2 cos(<pi — £ 2 ) + cos 9 1 cos 9 2 \ , 

9 comp = r 2 • eg 1 = r 2 e r2 ■ eg 1 

= r 2 (e* sin 9 2 cos £2 + e y sin 9 2 sin ip 2 + e z cos 9 2 ) 

• (e z cos 9\ cos <p 1 + e y cos 9\ sin p>\ — e z sin 9\) 

= r 2 (sin #2 cos p> 2 cos 9\ cos ipi + sin 9 2 sin ip 2 cos 9\ sin ipi — cos 9 2 sin 9 1 ) 

= r 2 [sin 9 2 cos 9\ cos(y>i — (p 2 ) — cos 9 2 sin 9i], 
ip comp = r 2 • e vi = r 2 e r2 ■ e vi 

= r 2 (e x sin 9 2 cos ip 2 + e y sin 9 2 sin ip 2 + e 2 cos # 2 ) • (—e x sin pi + e y cos pi) 
= r 2 (— sin 9 2 cos p 2 sin pi + sin 9 2 sin p 2 cos pi) = r 2 sin 9 2 sin(<p 2 — pi). 


(c) The spherical components of ri at P 2 can be found similarly. In fact, 
switching the indices “1” and “2” in the expressions of part (b) gives the desired 
formulas. 


Example 1.4.3. To illustrate further the conversion of vectors from one coordinate 
system to another, consider a charge q that is located at the cylindrical coordinates 
(a, tt/ 3,— a). We want to find the spherical components of the electrostatic field E 
of this charge at a point P with Cartesian coordinates (a, a, a). 

The most straightforward way of doing this is to convert all coordinates to 
Cartesian, find the held, and then take the dot products with appropriate unit 
vectors. The Cartesian coordinates of the charge are 


Xq = pq COSiPq = <2 COS = \a, 

. /7T\ V3a 
y q = Pqsmpq = asm 


0.866a, 


z q = -a. 
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Thus, 

r — r q = [a — ^a)e x + (a — 0.866 a)e y + (a — (—a))e 2 = 0.5ae, + 0.134aej, + 2ae z 
and 


|r - r,| = 1 /(0.5a) 2 + (0.134a) 2 + (2a) 2 = 2.066a, 
and the electric field at P can be written in terms of Cartesian unit vectors at P: 


E = 


k e q 


r(r - r 9 ) = k e q 


0.5 ae x + 0.134aej, + 2 ae 2 
(2.066a) 3 


_ j 0.5(3, + 0.134ey + 2e 2 _ k e q ,, n nl n 00 , Q .. , 

— keq 7 . — r-(0.0567e, + 0.015^ey + 0.2268e 2 ). 

8.818a 2 a 2 

To find the spherical components of the field at P, we first express the spherical 
unit vectors at P in terms of Cartesian unit vectors. For this, we need the spherical 
coordinates of P\ 


r = V a 2 + a 2 + a 2 = VSa = 1.732a, 

cos 8 = - = = -4= = 0.577 =+ 8 = 0.955, 

r V3 a VS 

tan ip = — = — = 1 => <p = -y = 0.785. 
x a 4 

It now follows that 


e r = e, sin 9 cos p + e y sin 8 sin p + e 2 cos 8 = 0.577e, + 0.577e y + 0.577e z , 
eg — e x cos 9 cos p + e y cos 0 simp — e z sin 8 = 0.408e, + 0.408(3,, — 0.816e 2 , 
e v = — e, sin p + e y cos p = —0.707e, + 0.707e,,. 

Now we take the dot product of E with these unit vectors to find its spherical 
components at P. The reader may first easily check that 


Gr ' Gj; 

= 0.577, 

• G y 

= 0.577, 

G r • e. 


= 0.408, 

6 9 ‘ 

= 0.408, 

e e • e 


= -0.707, 

B(p • G y 

= 0.707, 

G ip ' G, 


We can now finally calculate the field components: 
k e q 


E r = E ■ e r = -^(0.0567e r ■ e, + 0.0152e r ■ e y + 0.2268e r • e 2 ) 
a 1 

= ^(0.0567 x 0.577 + 0.0152 x 0.577 + 0.2268 x 0.577) = 0 . 1724 ^ 
a z a z 

Eg = E ■ eg = ^(0.0567 eg ■ e x + 0.0152e„ ■ e„ + 0.2268e e ■ e 2 ) 
a 2 

keq , 
a 2 

Etp = E ■ e v = ^-(0.0567e v • e, + 0.0152(3^ • e y + 0.22686,, ■ e z ) 
a 2 

^(-0.0567 x 0.707 + 0.0152 x 0.707) = -0.0294^. 
a z nZ 


= ^(0.0567 x 0.408 + 0.0152 x 0.408 - 0.2268 x 0.816) = -0.1558^, 
a 2 a 2 


The choice of Cartesian coordinates was the most straightforward one, but one 
can choose any other coordinate system to calculate the field and find the com¬ 
ponents in any other set of unit vectors. The reader is urged to try the other 
choices. ■ 
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1.5 Problems 

1.1. Find the equation of a line that passes through the following pairs of 
points: 


(a) (1,0,1) and (-1,1,0). (b) (2, 2,-1) and (-2,-1,1). 

(c) (1,1,1) and (-1,1,-1). (d) (1,1,1) and (-2,2,0). 

(e) (0,2, —1) and (3, -1,1). (f) (0,1,0) and (—1,0, —1). 

1.2. Use Figure 1.4 and the interpretation of the a • b as the product of the 
length of a with the projection of b along a to show that 

(a + b) • c = a • c + b • c. 

1.3. Take the dot product of a = b — c with itself and prove the law of cosines 
by interpreting the result geometrically. Note that the three vectors form a 
triangle. 

1.4. Find the angle between a = 2e x + 3e y + e z and b = e^, — 6e y + 2e z . 

1.5. Find the angle between a = Qe x + e y — 6e, and b = Ae x — 6e y + 5e z . 

1.6. Show that a = e x cos a + e y sin a and b = e a , cos /3 + e y sin (3 are unit 
vectors in the a;y-plane making angles a and f3 with the axaxis. Then take their 
dot product and obtain a formula for cos(a — (3). Now use sin a: = cos(7t/2 — x) 
to find the formula for sin(a — (3). 

1.7. Vectors a and b are the sides of a parallelogram, c and d are its diagonals, 
and 9 is the angle between a and b. Show that 

l c | 2 + |d| 2 = 2( |a| 2 + |b| 2 ) 

and that 

|c| 2 — |d| 2 = 4|a| |b| cos 9. 

1.8. Given a, b, and c —vectors from the origin to the points A, B , and C — 
show that the vector (a x b) + (b x c) + (c x a) is perpendicular to the plane 
ABC. 

1.9. Show that the vectors a = 2e x — e y + e z . b = e x — 3e y — 5e z , and 
c = 3ea, — 4e y — Ae z form the sides of a right triangle. 

1.10. (a) Find the vector form of the equation of the plane defined by the three 
points P, Q, and R with coordinates (pi,P 2 ,P 3 ), ( 91 , 92 , 93 ), and (ri,r 2 ,r 3 ), 
respectively. Hint: The position vector of a point X = (x , y, z) in the plane 
is perpendicular to the cross product of PQ and PR. 

(b) Determine an equation for the plane passing through the points (2, —1,1), 
(3,2,-1), and (-1,3,2). 

1.11. Derive the law of sines for a triangle using vectors. 
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1.12. Using vectors, show that the diagonals of a rhombus are orthogonal. 

1.13. Show that a necessary and sufficient condition for three vectors to be 
in the same plane is that the dot product of one with the cross product of the 
other two be zero. 

1.14. Show that two nonzero vectors have the same direction if and only if 
their cross product vanishes. 

1.15. Show the following vector identities by writing each vector in terms of 
Cartesian unit vectors and showing that each component of the LHS is equal 
to the corresponding component of the RHS. 

(a) a • (b x c) = c • (a x b) = b • (c x a). 

(b) a x (b x c) = b(a • c) — c(a • b), this is called the bac cab rule. 

(c) (a x b) • (c x d) = (a • c)(b • d) — (a • d)(b • c). 

(d) (a x b) x (c x d) = b[a • (c x d)] — a[b • (c x d)]. 

(e) (a x b) x (c x d) = c[a • (b x d)] — d[a • (b x c)]. 

(f) (a x b) • (a x b) = |a| 2 |b| 2 — (a • b) 2 . 

1.16. Convert the following triplets from the given coordinate system to the 
other two. All angles are in radians. 

Cartesian: (1,2,1), (0,0,1), (1, -1,0), (0,1,0), (1,1,1), (2,2,2), (0,0,5), 

( 1 , 1 , 0 ), ( 1 , 0 , 0 ). 

Spherical: (2, 7 t/3 , 7 t/4 ), (5,0, 7 t/3 ), (3, 7 t/3 , 37 t/4 ), (1,1,0), (1,0,0), 

(5,0,*), (3,tt,<?), (0,*,O). 

Cylindrical: (0, A. 4), (2, tt, 0), (0,217, -18), (1, 3tt/4, -2), (1, 2,3), (1,0, 0). 

1.17. Derive the second and third relations in Equation (1.21). 

1.18. Points P and P' have spherical coordinates ( r,6,<p ) and 

cylindrical coordinates (p,ip,z) and and Cartesian coordinates 

(a ’,y,z) and (x\y',z'), respectively. Write |r — r'| in all three coordinate 
systems. Hint: Use Equation (1.2) with a = r — r' and r and r' written in 
terms of appropriate unit vectors. 

1.19. Show that Equation (1.24) is independent of where we choose the origin 
to be. Hint: Pick a different origin O' whose position vector relative to O is 
R and write the equation in terms of position vectors relative to O' and show 
that the final result is the same as in Equation (1.24). 

1.20. Three point charges are located at the corners of an equilateral triangle 
of sides a with the origin at the center of the triangle as shown in Figure 1.19. 

(a) Find the general expression for the electric field and electric potential at 

( 0 , 0 , 0 ). 

(b) Find a relation between q and Q such that the 0 -component of the field 
vanishes for all values of 0 . What are E and $ for such charges? 

(c) Calculate E and <J> for 0 = a. 
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1 . 21 . A point charge Q and two point charges q are located in the xy-plane 
at the corners of an equilateral triangle of side a as shown in Figure 1.20. 

(a) Find the potential and the Cartesian components of the electrostatic held 
at ( 0 , 0 , 2 ). 

(b) Show that it is impossible for E to be along the 2 -axis. 

(c) Calculate E for 2 = a and find Q in terms of q such that E z vanishes for 
this value of 2 . 

(d) What is the value of <I> at 2 = a for the charges found in (c)? 

1 . 22 . Three point charges each of magnitude Q and one point charge q are 
located at the corners of a square of side 2a. Using an appropriate coordinate 
system. 

(a) Find the electric held and potential at point P located on the diagonal 
from Q to q (and beyond) a distance 2\[2a from the center. 

(b) Find a relation, if it exists, between q and Q such that the held vanishes 
at P. 

1.23. A charge q is located at the spherical coordinates (a, 7r/4, 7 t/ 3). Find 
the electrostatic potential and the Cartesian components of the electrostatic 
held of this charge at a point P with spherical coordinates (a, 7 t/6, 7t/ 4) . Write 
the held components as numerical multiples of k e q/a 2 , and the potential as a 



Figure 1.20: 
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numerical multiple of k e q/a. 

1.24. A charge q is located at the cylindrical coordinates (a,7r/4,2a). Find 
the Cartesian components of the electrostatic field of this charge at a point P 
with cylindrical coordinates (2a,7r/6,a). Write your answers as a numerical 
multiple of k e q/a 2 . Find the electrostatic potential at P and express it as a 
numerical multiple of k e q/a. 

1.25. A charge q is located at the cylindrical coordinates (a, 7 t/ 3, — a). 

(a) Find the Cartesian components of the electrostatic field E of this charge 
at a point P with cylindrical coordinates (a, 7t/ 4, 2a). Write your answers as 
a numerical multiple of k e q/a 2 . 

(b) Write E in terms of the cylindrical unit vectors at P. 

(c) Find the electrostatic potential at P as a numerical multiple of k e q/a. 

1.26. Two charges q and —2 q are located at the cylindrical coordinates 
(a, 7 t/ 4, a) and (a, 2ir/3, — a), respectively. 

(a) Find the Cartesian components of the electrostatic field at a point P with 
spherical coordinates (3a, 7r/6, 7 t/ 4). Write your answers as a numerical mul¬ 
tiple of k e q/a 2 . 

(b) Find the electrostatic potential at P. Write your answer as a numerical 
multiple of k e q/a. 

1.27. Two charges 3 q and — q are located at the spherical coordinates 
(a, 7t/3, 7t/6) and (2a, 7t/6, 7t/4), respectively. 

(a) Find the cylindrical components of the electrostatic field at a point P 
with spherical coordinates (3a, 7r/4, 7 t/ 4). Write your answers as a numerical 
multiple of k e q/a 2 . 

(b) Find the electrostatic potential at P. Write your answer as a numerical 
multiple of k e q/a. 

1.28. A charge q is located at the spherical coordinates (a,7r/3,7 t/ 6). Find 
the Cartesian components of the electrostatic field of this charge at a point P 
with cylindrical coordinates (a, 27 t/ 3, 2a). Write your answers as a numerical 
multiple of k e q/a 2 . Also find the electrostatic potential at P. 

1.29. Four charges are located at Cartesian coordinates as follows: q at 

A /o’ O /o 

(2a, 0, 0), —2 q at (0,2a, 0), — j=q at (—a, 0, 0), and- -=q at (0, —a, 0). Find 

5y5 5y5 

the Cartesian components of the electrostatic field at (0,0, a). 

1.30. Charge q is moving at constant speed v along the positive ;r-axis. Two 
other charges — q and 2 q are moving at constant speeds v and 2 v along positive 
y and negative z axes, respectively. Assume that at t = 0, q is at the origin, 
—q is at (0, a, 0), and 2 q at (0, 0, —a). 

(a) Find the Cartesian components of the magnetic field at a point (x, y , z) 
for t > 0. 

(a) Find the cylindrical components of the magnetic field at a point (p, <p, z) 
for t > 0. 
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(a) Find the spherical components of the magnetic field at a point (r, 0 , ip) for 
t > 0. 

1.31. A charge q is moving at constant speed v along a curve parametrized 

by 

x' = 6 as, y' = 3 as 2 , z' = — 2 as 3 

(a) Find the Cartesian components of the magnetic field at a point (x, y, z) 
as a function of s. 

(a) Find the cylindrical components of the magnetic field at a point {p, p, z) 
as a function of s. 

(a) Find the spherical components of the magnetic field at a point (r, 9, ip) as 
a function of s. 

1.32. Points Pi and P 2 have Cartesian coordinates (1,1,1) and (1,1,0), re¬ 
spectively. 

(a) Find the spherical coordinates of Pi and P 2 . 

(b) Write down the components of ri, the position vector of Pi, in terms of 
spherical unit vectors at Pi. 

(c) Write down the components of r 2 , the position vector of P 2 , in terms of 
spherical unit vectors at Pi. 

1.33. Points Pl and P 2 have Cartesian coordinates (2,2,0) and (1,0,1), re¬ 
spectively. 

(a) Find the spherical coordinates of Pi. 

(b) Express e ri , e^, and e Vl , the spherical unit vectors at Pl, in terms of the 
Cartesian unit vectors. 

(c) Find the components of the position vector of P 2 along the spherical unit 
vectors at Pi. 

(d) From its components in (c) find the length of r 2 , and show that it agrees 
with the length as calculated from its Cartesian components. 

1.34. Points Pl and P 2 have spherical coordinates 

Pi: (a, 7t/4, 7t/3) and P 2 : (a, 7t/3, 7t/4). 

(a) Find the angle between their position vectors ri and r 2 . 

(b) Find the spherical components of r 2 — ri at Pi. 

(c) Find the spherical components of r 2 — ri at P 2 . 

1.35. Point Pi has Cartesian coordinates (1,1,0), point P 2 has cylindrical 
coordinates (1,1, 0), and point P 3 has spherical coordinates (1,1, 0) where all 
angles are in radians. Express r 3 — ri in terms of the spherical unit vectors 
at P 2 . 

1.36. Points Pi and P 2 have Cartesian coordinates (1,1,1) and (1,2,1), and 
position vectors ri and r 2 , respectively. 

(a) Find the spherical coordinates of Pi and P 2 . 

(b) Find the components of ri, in terms of spherical unit vectors at Pi. 




42 


Coordinate Systems and Vectors 


(c) Find the components of r 2 , in terms of spherical unit vectors at P 2 - 

(d) Find the components of ri, in terms of spherical unit vectors at P 2 . 

(e) Find the components of r 2 , in terms of spherical unit vectors at Pi. 

1.37. Points P\ and P 2 have Cartesian coordinates 

(xi,yi,zi) and ( 2 : 2 , 2 / 2 , 22 ). 

(a) Find the angle between their position vectors ri and r 2 in terms of their 
coordinates. 

(b) Find the Cartesian components of r 2 — ri at P\. 

(c) Find the Cartesian components of r 2 — ri at P 2 . 

1.38. Points P\ and P 2 have cylindrical coordinates 

{pi,<pi,zi) and {p 2 ,P 2 ,z 2 ) 

(a) Find the angle between their position vectors ri and r 2 in terms of their 
coordinates. 

(b) Find the cylindrical components of r 2 — ri at Pi. 

(c) Find the cylindrical components of r 2 — ri at P 2 . 

1.39. Write the Cartesian unit vectors in terms of spherical unit vectors with 
coefficients written in spherical coordinates. 

1.40. Write the spherical unit vectors in terms of Cartesian unit vectors with 
coefficients written in Cartesian coordinates. 

1.41. In Example 1.4.3, calculate the electric field using cylindrical coordi¬ 
nates, then find the components in terms of (a) Cartesian and (b) spherical 
unit vectors. 

1.42. In Example 1.4.3, calculate the electric field using spherical coordinates, 
then find the components in terms of (a) Cartesian and (b) cylindrical unit 
vectors. 




Chapter 2 

Differentiation 


Physics deals with both the large and the small. Its domain of study includes 
the interior of the nucleus of an atom as well as the exterior of a galaxy. It is, 
therefore, natural for the scope of physical theories to switch between global , 
or large-scale, and local , or small-scale regimes. Such an interplay between 
the local and the global has existed ever since Newton and others discovered 
the mathematical translation of this interplay: Derivatives are defined as local 
objects while integrals encompass global properties. This chapter is devoted 
to the concept of differentiation, which we shall consider as a natural tool with 
which many physical concepts are expressed most concisely and conveniently. 

All physical quantities reside in space and change with time. Even a static 
quantity—once scrutinized—will reveal noticeable attributes of change, vali¬ 
dating the old adage “The only thing that doesn’t change is the change itself.” 
Thus, static, or time-independent, quantities are so only as approximations 
to the true physical quantity which is dynamic. 

Take the temperature of the surface of the Earth. As we move about on the 
globe, we notice the variation of this quantity with location—poles as opposed 
to the equator—and with time—winter versus summer. A specification of 
temperature requires that of location and time. We thus speak of local and 
instantaneous temperature. This is an example of the fact that, generally 
speaking, all physical quantities are functions of space and time. 

Locality and instantaneity have both a mathematical and a physical (or 
operational) interpretation. Mathematically, they correspond to a point in 
space and an instant of time with no extension or spread whatsoever. Physi¬ 
cally, or operationally, many quantities require an extension in space and an 
interval in time to be defined. Thus, a local weatherman’s morning statement 
“Today’s high will be 45” limits the location to the size of a city, and the time 
to at most a.m. or p.m. This is admittedly a rough localization, suitable for 
a weatherman’s forecast. Nevertheless, even the most precise statements in 
physics embody a space extension as well as a time interval whose “sizes” are 
determined by the physical system under investigation. If we are studying 
heat conduction by a metal bar several inches long, then “local” temperature 
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takes a completely different meaning from the weatherman’s “local” temper¬ 
ature. In the latter case, a city is as local as one gets, while in the former, 
variations over a centimeter are significant. 


velocity as an 
example of an 
instantaneously 
defined quantity 

derivative 


derivative as rate 
of change: 
independent 
variable is time or 
a coordinate 


derivative as the 
ratio of two 
infinitesimal 
physical quantities 


particles and fields 


2.1 The Derivative 


A prime example of an instantaneously defined quantity is velocity. To find 
the velocity of a moving particle at time to, determine its position ro at time 
to, determine also its position r at time t with t close to to, divide r — ro by 
t — to, and make t — to as small as possible. This defines the derivative of r 
with respect to t which we call velocity v: 


r(t 0 ) = lim 


r - r 0 

t—>to t — to 
Acceleration is defined similarly: 

v — v 0 _ dv 


dr 

dt 


= r(t 0 ). 


a(fo) = lim 


o t — to 


dt 


t—to 


t=to 


d 2 r 
dt 2 


= f(f 0 ). 


t=io 


Velocity and acceleration are examples of derivatives which are generally 
called rate of change. In the rate of change, one is interested in the way 
a quantity (dependent variable) changes as another quantity (independent 
variable) is allowed to vary. In the majority of rates of change, the independent 
variable is either time or one of the space coordinates. 

The second type of derivative is simply the ratio of two infinitesimal phys¬ 
ical quantities. In general, whenever a physical quantity Q is defined as the 
ratio of two other physical quantities R and S, one must define Q in a small 
neighborhood (small volume, area, length, or time interval). One, therefore, 
writes 


Q 


A R_dR 
a™o AS 7 = dS’ 


( 2 . 1 ) 


where A R and AS are both local small quantities. Being physical quantities, 
both R and S, and therefore A R and AS are, in general, functions of position 
and time. Hence, their ratio, Q, is also a function of position and time. The 
last sentence requires further elaboration. 

In physics, we deal with two completely different, yet subtly related, ob¬ 
jects: particles and fields. The former is no doubt familiar to the reader. 
Examples of the latter are the gravitational, electric, and magnetic fields, as 
well as the less familiar velocity field of a fluid such as water in a river or air 
in the atmosphere. Suppose we want to specify the “state” of the two types 
of objects at a particular time t. For a particle, this means determining its 
position and momentum or velocity 1 at t. Imagine the particle carrying with 


1 It is a fundamental result of classical mechanics that such a specification completely 
determines the subsequent motion of the particle and, therefore, any other property of the 
particle will be specified by the initial position and momentum. 
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it a vector representing its velocity. Then a snapshot of the particle at time t 
depicts its location as well as its velocity, and thus, a complete specification 
of the particle. A large collection of such snapshots specifies the motion of the 
particle. Since each snapshot represents an instant of time, and since the col¬ 
lection of snapshots specifies the motion, we conclude that, for particles, the 
only independent variable is time. 2 A problem involving a classical particle is 
solved once we find its position as a function of time alone. 

How do we specify the “state” of a fluid? A fluid is an extended object, 
different parts of which behave differently. Attaching a vector to different- 
points of the fluid to represent the velocity at that point, and taking snapshots 
at different times, we can get an idea of how the fluid behaves. This is 
done constantly (without the arrows, of course) by weather satellites whose 
snapshots are sometimes shown on our TV screens and reveal, for example, 
the turbulence developed by a hurricane. A complete determination of the 
fluid, therefore, entails a specification of the velocity vector at different points 
of the fluid for different times. A vector which varies from point to point is 
called a vector field. A problem involving a classical fluid is, therefore, solved 
once we find its velocity field as a function of position and time. The concept 
of a field can be abstracted from the physical reality of the fluid. 3 It then 
becomes a legitimate physical entity whose specification requires a position, 
a time, and a direction (if the field happens to be a vector field), just like the 
specification of the velocity field of a fluid. 

The reason for going into so much detail in the last two paragraphs is to 
prevent a possible confusion. In the case of velocity and acceleration, one 
divides two quantities and the limit of the ratio turns out to be a function 
of the denominator, and one might get the impression that in (2.1), Q is a 
function of S. This is not the case, as, in general, all three quantities, R, S, 
and Q are functions of other (independent) variables, for instance, the three 
coordinates specifying position and time. 

Velocity and acceleration are examples of the first interpretation of deriva¬ 
tive, the rate of change. There are many situations in which the second inter¬ 
pretation of derivative is applicable. One important example is the density 
of a physical quantity R: 


PR 


A R _ dR 
a™o AV = dV' 


( 2 . 2 ) 


vector field 


density: an 
example of the 
second 

interpretation of 
derivative 


where A R is the amount of the quantity R in the small volume AV. Examples 
of densities are mass density p m , electric charge density p q , number density 


2 This is true only in a classical picture of particles. A quantum mechanical picture 
disallows a complete determination of the position and momentum of a particle. 

3 Historically, this abstraction was very hard to achieve in the case of electromagnetism, 
where, for a long time a hypothetical “fluid” called aether was assumed to support the 
electromagnetic field. It was Einstein who suggested getting rid of the fluid altogether, and 
attaching physical reality and significance to the field itself. 
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p n , energy density pe , and momentum density p p . Sometimes it is convenient 
to define surface and linear densities: 


A R dR 

(Jr = lun —— = - 

Aa-»0 Aa 


Xr = lim 

Al^O A l 


da 

dR 

dT 


— ji > 


(2.3) 


pressure: another 
example of the 
second 

interpretation of 
derivative 


where A R is the amount of R on the small area Aa or along the small length 
A l. The most frequently encountered surface density is that of electric charge 
which is commonly found on the surface of a conductor. 

Another example of Equation (2.1) is pressure defined as 


P = 


lim 
Aa—^0 


A F ± 
Aa 


dF± 

da 


(2.4) 


where A F± is the force perpendicular to the surface Aa. This discussion 
makes it clear that The most natural setting for the concept of derivative is 
the ratio of two physical quantities which are defined locally. Equations (2.2) 
and (2.3) are hardly interpreted as the rate of change of density with respect 
to volume, area, or length! 


Descartes said that he “neither admits nor hopes for any principles in Physics other 
than those which are in Geometry or in abstract mathematics.” And Nature couldn’t 
agree more! The start of modern physics coincides with the start of modern mathe¬ 
matics. Calculus was, in large parts, motivated by the need for a quantitative anal¬ 
ysis of physical problems. Calculation of instantaneous velocities and accelerations, 
determination of tangents to lens surfaces, evaluation of the angle corresponding to 
the maximum range of a projectile, and calculation of the lengths of curves such 
as the orbits of planets around the Sun were only a few of the physical motiva¬ 
tions that instigated the intense activities of the seventeenth-century physicists and 
mathematicians alike. 

The problems mentioned above were tackled by at least a dozen of the greatest 
mathematicians of the seventeenth century and many other minor ones. All of these 
efforts climaxed in the monumental achievements of Newton and Leibniz. Newton, 
in particular, noted the generality of the concept of rate of change—a concept he 
used for calculating instantaneous velocities—and bestowed a universal character 
upon the notion of derivative. 

Of the several methods advanced to find the tangent to a curve, Fermat’s is the 
closest to the modern treatment. He approximates the increment of the tangent line 
with the increment of the function describing the curve and takes the ratio of the 
two increments to find the angle of the tangent line. Fermat, however, ignores the 
question of limits as the increments go to zero, a procedure necessary for finding 
the slope of tangents. Descartes method, on the other hand, is purely algebraic and 
is not plagued by the question of the limits. However, his method worked only for 
polynomials. 

Another great name associated with the development of calculus is Isaac Barrow 
who used elaborate geometrical methods to find tangents. He was the first to point 
out the connection between integration and differentiation. Barrow was a professor 
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of mathematics at Cambridge University. Well versed in both Greek and Arabic (he 
was once nominated for a chair of Greek at Cambridge in 1655 but was denied the 
chair due to his loyalist views), he was able to translate some of Euclid’s works and 
to improve the translations of other works of Euclid as well as Archimedes. 

After spending some time in eastern Europe, he returned to England and ac¬ 
cepted the Greek chair denied him before. To supplement his income, he taught 
geometry at Gresham College, London. However, he soon gave up his geometry 
chair to serve as the first Lucasian professor of mathematics at Cambridge from 
1663 to 1669, at which time Barrow resigned his chair of mathematics in favor of 
his student Isaac Newton and turned to theological studies. 

His chief work Lectiones Geometricae is one of the great contributions to cal¬ 
culus. In it he used geometrical methods, “freed from the loathsome burdens of 
calculations,” as he put it. 


£ 

4 


Isaac Barrow 
1630-1677 


2.2 Partial Derivatives 

All physical quantities are real functions of space and time. This means that 
given the three coordinates of a point in space, and an instant of time, we 
can associate a real number with them which happens to be the value of the 
physical quantity at that point and time. 4 Thus, Q(x,y,z,t) is the value of 
the physical quantity Q at time t at a point whose Cartesian coordinates are 
(x,y,z). Similarly, we write Q(r,9,(p,t) and Q(p,<p, z,t) for spherical and 
cylindrical coordinates, respectively. Thus, ultimately, the physical quantities 
are functions of four real variables. However, there are many circumstances in 
which the quantity may be a function of less or more variables. An example of 
the former is all static phenomena in which the quantity is assumed really 
approximated—to be independent of time. Then the quantity is a function 
of only three variables. 5 Physical quantities that depend on more than four 
variables are numerous in physics: In the mechanics of many particles, all 
quantities of interest depend, in general, on the coordinates of all particles, 
and in thermodynamics one encounters a multitude of thermodynamical vari¬ 
ables upon which many quantities of interest depend. 

2.2.1 Definition, Notation, and Basic Properties 

We consider real functions fixi,X 2 , ■ • ■ ,x n ) of many variables. General¬ 
izing the notation that denotes the set of real numbers by R, the set of 
points in a plane by R 2 , and those in space by R 3 , we consider the n-tuples 
{xi,X 2 , ■ ■ ■ ,x n ) as points in a (hyper)space R". Similarly, just as the triplet 
(a;, y, z) can be identified with the position vector r, we abbreviate the n-tuple 
(xi,X 2 ,- ■ ■ ,x n ) by r. Constant n-tuples will be denoted by the same letter 

4 This statement is not strictly true. There are many physical quantities which require 
more than one real number for their specification. A vector is a prime example which 
requires three real numbers to be specified. Thus, a vector field, which we discussed earlier, 
is really a collection of three real functions. 

5 If the natural setting of the problem is a surface or a line, then the number of variables 
is further reduced to two or one. 
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partial derivative 
defined 


used for components but in boldface type. For example ( 01 , 02 ,..., o n ) = a 
and ( 61 , 62 , • • •, b n ) = b. This suggests using x in place of r, and we shall do 
so once in a while. 

Being independent, we can vary any one of the variables of a function 
at will while keeping the others constant. The concept of derivative is now 
applied to such a variation. The result is partial derivative. To be more 
precise, the partial derivative of /(r) with respect to the independent variable 

d f 

Xk at (oi, 02 ,..., a n ) is denoted 6 by jr—{ a) and is defined as follows: 


df , ... f(ai ,..., Ofc + e,. 

——(a) = lim- 

OXk 


i) -/(oi,...,a fc ,. 


(2.5) 


df 

One usually leaves out the o’s and simply writes keeping in mind that 

the result has to be evaluated at some specific “point” of R". As the definition 
suggests, the partial derivative with respect to Xk is obtained by the usual rules 
of differentiation with the proviso that all the other variables are assumed to 
be constants. 

A useful strategy is to turn Equation (2.5) around and write the incre¬ 
ment in / in terms of the partial derivative. This possibility is the result of 
the meaning of the limit: The closer e gets to zero the better the ratio approx¬ 
imates the partial derivative. Thus we can leave out lim e ^o and approximate 
the two sides. After multiplying both sides by e, we obtain 

df 

A kf = f(a 1 , . ..,a k + a n )~ /(«i,..., a k ,..., a„) « , 


where the subscript k on the LHS indicates the independent variable being var¬ 
ied. Sometimes we use the notation A^/(a) to emphasize the point at which 
the increment of the function—due to an increment in the kth argument—is 
being evaluated. Most of the time, however, for notational convenience, we 
shall leave out the arguments, it being understood that all quantities are to 
be evaluated at some specific “point.” Since e is an increment in Xk, it is 
natural to denote it as A Xk, and write the above equation as 


A k f — f (& 1 , • • • , CLk T Axk , - - * , Ojn) f (ddk !•••■> ^n) 



If two independent variables, say Xk and Xj, are varied we still can find 
the increment in /: 


A k.j f — f (di , • * • 

> 0-k + 

A Xk 

A * * * 1 

dj + A xj ,... 

,CLn) 

- 

• • 5 ® k •> 

* * * 5 

aj,.. 

• 5 d n ) 


= f(a 1 ,... 

) dfc + 

A Xk 

!?•**? 

dj + A Xj ,... 

•> ®n) 

— f(a 1 ,. 

• • ■> O' k ? 

. . . , 

a i + 

<1 


+ f( a 1; • 

• • ■> ■> 

* * * 5 

a j + 

e 

<1 


- f(ai,. 

. . , 

* * * 1 

aj ,.. 

• 7 



®This notation may be confusing because of the a’s and the x’s. A better notation will 
be introduced shortly. 
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where we have added and subtracted the same term on the RHS of this equa¬ 
tion. Now we use the definition of the change in a function at a point to 
write 


Ak,j/ 


A k f (o-l 7 • • * j Q>k j • • • ; tiy T A Xj , • • • , 0>n ) 

Ay f (dr, • • • , ttfc, . . . , Ojj , . . . , tt n ) 


Ax, 


df 


k q (kl 1 , . . . , CLk , • • • , Q>j T A Xj , . . . , dji) 


df 


d - A^ (u-i,..., Ufc,..., Uj,..., flu)- 


The first term on the RHS expresses the change in the function due to a 
change in x*,, and the second expresses the change in the function due to a 
change in Xj. As their arguments show, the derivatives in the last two lines 
are not evaluated at the same point. However, the difference between these 
arguments is small—of order A Xj —which, when multiplied by the small Ax’s 
in front of them, will be even smaller. In the limit that A Xj and Ax*, go to 
zero, we can ignore this subtle difference and write 

df df 

&k,jf « -^-Ax fc + —^—Axj. (2.6) 

dxk dxj 


This shows that the total change is simply the sum of the change due to Xj 
and Xfc. 


Box 2.2.1. In general, the change in f due to a change in all the inde¬ 
pendent variables is A/ « 


Some of the Ax’s may be zero of course. For example, if all of the Ax’s are 
zero except Ax, and Ax*, then the equation in the Box above reduces to 
(2.6). The following example describes a situation which occurs frequently in 
thermodynamics. 

Example 2.2.1. Suppose a physical quantity Q is a function of other physical 
quantities U, V, and W. We write this as Q = f(U, V, W) with the intention that 
U, V, and W are the independent variables. It is possible, however, to solve for one of 
the independent variables in terms of Q and the rest of the independent variables.' It 
is therefore legitimate to seek the partial derivative of any one of the four quantities 
with respect to any other one. Because of the multitude of thermodynamic variables, 
it may become confusing as to which variables are kept constant. Therefore, it is 
common in thermodynamics to use the variables held constant as subscripts of the 
partial derivative. Thus, 

fdQ\ fdv\ 

\dv) UM ' \dQj uw 

7 That this can be done under very mild assumptions regarding the function / is the 
content of the celebrated implicit function theorem proved in higher analysis. 


du_\ 

9V ) Q,W : 


(2.7) 


an example that is 
useful for 
thermodynamics 
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are typical examples of partial derivatives, and in priciple, one can solve for V in 
terms of Q, U, and W and differentiate the resulting funtion with respect to Q to 
find the second term in Equation (2.7). Similarly, one can solve for U in terms of Q, 
V, and W and differentiate the resulting funtion with respect to V to find the last 
term. However, Box 2.2.1 allows us to bypass this (sometimes impossible) task and 
evaluate derivatives by directly differentiating the given function. Let’s see how. 

The first term is obvious: 

(dQ\ = (d£\ 

\dVj UiW \dVj uw - 


The key to the evaluation of the other two is Box 2.2.1 as applied to Q. 
write 


A Q 



A U + 

v,w 



AV + 

u,w 



AW. 

u,v 


If U and W are kept constant, then A U = 0 = AW, and we have 


We thus 

( 2 . 8 ) 


A Q 



AV => 1 

u,w 


(df\ AH 
\9Vj UtW AQ- 


In the limit that A Q goes to zero, the ratio of the A’s becomes the corresponding 
partial derivative and the approximation becomes equality, leading to the relation 


1 = 


21 

dV 


u,w 


dV 

dQ 


u,w 


Changing / to Q, and solving for the partial derivative, we obtain 


dQj 


u,w 



(2.9) 


which is a result we should have expected. This equation shows that we don’t have 
to solve for V in terms of the other three variables to find its derivative with respect 
to Q. Just differentiate f(U. V, W) with respect to V and take its reciprocal! 

The last partial derivative is obtained by setting A Q and AW equal to zero in 
(2.8). The result is 


0 



AU + 

v,w 



AV => 

u,w 


AU 

AH 



Once again, taking the limit as AH —> 0, noting that the LHS becomes a partial 
derivative, subscripting this partial with the variables held constant, and substitut¬ 
ing Q for /, 8 9 we obtain 



Q,W 



( 2 . 10 ) 


8 Recall that if y = f(x ), then dy/dx and df /dx represent the same quantity. 

9 This is an abuse of notation because Q is held constant and the derivative of any 
constant is always zero, while the derivative of / is well defined. This abuse of notation is 
so common in thermodynamics that we shall adopt it here as well. 
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Thus, by differentiating /([/, V, W) with respect to V and U and taking their ratios, 
we obtain the derivative of U with respect to V ; no need to solve for U in terms of 
the other three variables! 

Equation (2.10) is ususlly written in a more symmetric way. The numerator of 
the fraction on the RHS can be replaced using Equation (2.9). Then, the result can 
be written as 

(d_U\ (dV\ f 9Q\ = _ 1 (2U) 

\wjQ, w \0Qj UlW \duJ VtW ' { - J 

A simpler version of this result, in which the fourth variable W is absent, is com¬ 
monly used in thermodynamics. ■ 


an important 
relation used often 
in 

thermodynamics 


A word of caution about notation is in order. We chose the set of vari¬ 
ables (xi,X2, • • • ,x„) as arguments of the function /, and then denoted the 
derivative by df/dxk ■ We could have chosen any other set of symbols such 
as (j/i, 1/2, ■ ■ ■, y n )i or (ti, t2, ■ ■ ■, t n ) as the arguments. Then we would have 
had to write df /dyk, or df /dt k for partial derivatives. This freedom of choice 
can become confusing because, little effort is made in the literature to distin¬ 
guish between the “free” general arguments and the specific point at which 
the derivative is to be evaluated. For example, the symbol (df/dx)(y, x) can 
be interpreted in two ways: It can be the derivative of a function of two vari¬ 
ables with respect to its first argument, subsequently evaluated at the point 
with coordinates ( y,x ), or it could be the derivative with respect to the sec¬ 
ond argument, in a seemingly strange world in which y is used as the first 
argument! The longstanding usage of a: as the first partner of a doublet by no 
means reserves the first slot for x at all times. Therefore, the confusion above 
is indeed a legitimate one. 

We started the discussion by distinguishing between the free arguments 
(xi,X 2 , ■ ■ ■, x n ) and the specific point (aq, 02 ,..., a n ). However, making this 
distinction every time we write down a partial derivative can become very 
clumsy. Nevertheless, the reader should always keep in mind this distinction 
and write it down explicitly whenever necessary. To minimize the confusion, 
we leave out all symbols but keep only the position of the variable in the array. 
Specifically, 


confusion 
surrounding the 
expression 
(df /dx)(y,x) 
and a notation 
that resolves the 
confusion 


Box 2.2.2. We write d k f for the derivative of f with respect to its 
kth argument. This derivative is a function; We can evaluate it at 
(ai,a 2 ,.. for which we write d k f(ai,a 2 , ...,a n ) = d k f( a). 


This notation avoids any reference to the “free” arguments. One can choose 
any symbol for the free arguments; the final answer is independent of this 
choice: 


d k f(ai,a 2 , ...,a n ) 


df(t 1 ,t 2 ,...,t n ) = df(y 1 ,y 2 ,...,y n ) 

dt k t=a dy k 

0/(<? 1 ,<?2,...,<?n) 

dV k 
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order of 

differentiation in a 
mixed derivative is 
immaterial 


because the only thing that matters is the index k which tells us with respect 
to what variable we are differentiating. 

Example 2 . 2 . 2 . Consider the function f(x,y,z) = e xy ^ z . We write it first as 
f(xi,x2,x 3 ) = e xix2/x3 . Then 

dif(xi,X2,x 3 ) = (x2/x 3 )e xlX2/x3 , 
d 2 f(x i,x 2 ,x 3 ) = (xi/x 3 )e xlX2/xs , 
d 3 f(x i,x 2 ,x 3 ) = ~(x\X2/x 3 )e xlX2,X3 . 

Now that the functional form of all partial derivatives are derived, we can evaluate 
them at any point we want. For example, 

d 2 f( 1,2,3) = |e 2/3 , d 3 /(l,l,l) = -e, 

dif{t,u,v) = {u/v)e tu/v , d 3 f(z,x,y) = ~{zx/y 2 )e xz,y . ■ 


Higher-order derivatives are defined just as in the single-variable case, 
except that now mixed derivatives are also possible. Thus, 


d 1 (d 1 f) = d 2 f = 


d 2 f 

d Xl 2 ’ 


di (d 5 f) 


d 2 f 

dx\dx$ ' 


dj{d k f) = 


d 2 f 

dxjdxk ’ 


are all legitimate derivatives. An important property of mixed derivatives is 
that—for well-behaved functions—the order of differentiation is immaterial. 


Example 2.2.3. Functions which can be written as the product of single-variable 
functions are important in the solution of partial differential equations. Suppose 
that F(x,y,z) = f(x)g(y)h(z). Then diF(x,y, z) = f'(x)g(y)h(z) and the function 

diA, , _ f(x)g(y)h(z) _ f(x) 

F ’ ’ f(x)g(y)h(z) f(x) 


is seen to be independent of y and 2 . One can show similarly that 


d 2 F. 

~jr{x,y,z 


g'(y ) 
g(y) ’ 


d 3 F. 

—pr(x,y,z. 


h '{z) 
h(z) ’ 


each one depending on only one variable. 


Example 2.2.4. It is sometimes necessary to find the most general function, one of 
whose partial derivatives is given. This can be done by antidifferentiating (indefinite 
integral) with respect to the variable of the partial derivative, treating the rest of the 
variables constant. The usual “constant” of integration is replaced by a function of 
the undifferentiating variables. For example, suppose d 3 f(z,x,y) = ye x v A. Since 
the third variable is y, and the partial derivative is with respect to the third variable, 
we need to integrate with respect to y , keeping x and 2 constant. This gives 


f(z, x, y) = 


2x 2 


+ g(x,z) => f(x,y,z) 


x y 2 z 2 /x 

2 y 2 ' 


+ g(.y,x), 


where g , the “constant” of integration, is an arbitrary function of the first two 
variables. ■ 
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Figure 2.1: The tangent line at xo approximates the curve in a small neighborhood of 
xq. If confined in this neighborhood, i.e., if Aa; —which is equal to dx —is small, A/ 
and df are approximately equal. However, df is defined regardless of the size of A*. 


2.2.2 Differentials 

We now introduce the notion of differentials. Recall from calculus that, in 
the case of one variable, the differential of a function is related to a linear 
approximation of that function (see Figure 2.1). Basically, the tangent line at 
a point xo is considered as the linear approximation to the curve representing 
the function / in the neighborhood of Xq. The increment in the value of the 
function representing the tangent line —denoted by df{ xq )—when xq changes 
to Xo + Ax, is given by 

where, as a matter of notation, Ax has been replaced by dx, because by defi¬ 
nition, the differential of an independent variable is nothing but its increment. 
The above equation is not an approximation: dx can be any number, large or 
small, and df(x o) will be correspondingly large or small. The approximation 
starts when we try to replace A/ with df: The smaller the Ax = dx, the 
better the approximation Af(xo) « df(x o). The generalization of this idea 
to two variables involves approximating the surface representing the function 
f(x,y) by its tangent plane. For more variables, no visualizable geometric 
interpretation is possible, but the basic idea is to replace the A’s with c/’s and 
the approximation with equality in Box 2.2.1. The result is 

df = A-dx, + A- fe + ■ • ■ + = f'Adz,. (2.12) 

OX i OX 2 OX n z ^ OXi 

i=l 

We note that dxf s in Equation (2.12) determine the independent variables 
on which / depends, and the coefficient of dxi is df /dxf. This observation is 
the basis of transforming functions in such a way that the resulting functions 
depend on variables which are physically more useful. To be specific, suppose a 
function / exists which depends on ( x,y,z ), but from a physical perspective, 
a function which depends on the derivative of / with respect to its second 


differentials 
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Legendre 

transformation 


natural variables 
of thermodynamic 
functions 


Helmholtz free 
energy 


argument, and not on the second argument itself, is more valuable. This 
function can be obtained by a Legendre transformation on /, obtained 
by subtracting from / the product of the second argument and the derivative 
of / with respect to that argument. So, define a new function g by 

9 = f ~ yd 2 f = f ~ yh where h = d 2 f. 


Then, we get 

dg = df — hdy — y dh = d\f dx + d 2 f dy + d^f dz — hdy — y dh 
= d\f dx + 83 f dz — ydh. 


The differentials on the RHS of the last line indicate that the “natural” inde¬ 
pendent variables for g are x, z, and h, and that 


£9 

dx 


= 8i /, 



dg_ 

dh 


= -y- 


Legendre transformation is used frequently in thermodynamics and mechanics. 


Example 2.2.5. The internal energy U of a thermodynamical system is a function 
of entropy S, volume V, and number of moles N . These variables are called the 
natural variables of U, and we write U(S, V, N). Temperature T, pressure P, and 
chemical potential g, are defined as follows: 


T = 




g = 



where, as is common in thermodynamics, we have indicated the variables that are 
held constant as subscripts. Entropy is a hard quantity to measure: If we were to 
measure dU/dS, we would have to find the ratio of the change of U to that of S\ 
not an easy task! On the other hand, T is easy to measure, and thus it is desirable 
to Legendre transform U to obtain a function which has T as a natural variable. 
The Helmholtz free energy F is defined as F = U — ST. We note that since 


dU dU dU 

dU= dS + 7 J 77 dV + - 7^7 dN = T dS — P dV + g dN, 
dS dV dN 


we have 


dF = dU - SdT - TdS = TdS - PdV + gdN -SdT - TdS 

S v ✓ 

=dU 

= -SdT - PdV + gdN 


and, therefore 


dF 

dT 


= -S, 


V,N 


dF 

dV 


= ~P, 


dF 

dN 


g. 


T,V 


Helmholtz free energy is by far the most frequently used thermodynamic function, 
because all its “natural” variables, namely, T, V, and N, are easily measurable 
quantities. ■ 
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2.2.3 Chain Rule 

In many cases of physical interest, the “independent” variables Xi may in 
turn depend on one or more variables. Let us denote these new independent 
variables by (ti,t 2 , • • •, t m ) and the functional dependence of x-i by gi, so that 

Xi — gi (ti, ^2 ? * • • j t m .) = 5i (t), i — 1,2,..., n. (2.13) 


As the t’s vary, so will the x’s and consequently the function /. Therefore, / 
becomes dependent on the t’s and we can talk about partial derivatives of / 
with respect to one of the t’s. To find such a partial derivative, we go back to 
Box 2.2.1 and substitute for A Xi in terms of Af’s. From (2.13), we have 


A Xi 


|fi4 tl + |iA t2 + ... + Ai 

Ut\ CR2 CR m 


At,, 



i = 1,2 ,...,n. 


Substituting this in the equation of Box 2.2.1 yields 


A/ 




TL TfL rv 

= v V^At 
4 ^ dx>^ at , '■ 

i=l j=l J 


(2.14) 


Now suppose that we keep all of the t’s constant except for one, say t-j. Then 
A tj = 0 for all j except j = 7 and the sum over j will have only one nonzero 
term, i.e., the seventh term. In such a case, Equation (2.14) becomes 


A/: 


df dg i 
dx i dtf 


Atj ■ 


df dg 2 
dx2 dt'j 


At 7 ■ 




dx n dt~j 


dxt dtj 


Dividing both sides by Atj, taking limit, and replacing the approximation by 
equality, we obtain 


df = dj^dgi dj^dg 2 _ df_dgn = df dg t 

dtj dxi dt-j dx 2 dtj dx n dtj ^ dx x dtj' 

i =1 


Instead of tj, we could have used any other one of the t’s, say tig, or t 2 i 7 . the chain rule 

Theorem 2.2.6. ( The Chain Rule). Let /(x) be a function of the Xi and 
%i=gi{ t). Let h{ t) = /(<?i(t), 32 (f), • • •, <?n(t)) be a function ofthetk, called 
the composite of f and the gj. If t p is any one of these t’s, then 


d P h( t) = ^2 dif{g(t))d p gi(t), (2.15) 

i =1 


where g = ( 51 , 52 , • • -,g n ), and g(t) = (51 (t), 52 (t),... , 5 „(t)). 
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In words, the chain rule states that to evaluate the partial derivative of h 
with respect to its pth argument (of which there are m) at t, multiply the ith 
partial of / evaluated at g(t) by the pth partial of gi evaluated at t and sum 
over i. 

Sometimes the chain rule is written in the following less precise form: 

i^ = V— — = V— — (2 161 

dt p g x Qf 2-^i q x Qt ' 

where in the last line we have substituted Xi for gi. 

Example 2.2.7. Suppose F is a function of three variables given by 

F{x ’ y ’ z)=f (U)’ 

where / is some given function, and a is a constant. Let us calculate all partial 
derivatives of F at (a, 2 a, a) assuming that /'(2) = a. Denote the single variable of 
/ by u, so that F is obtained by substituting x 2 y/(az 2 ) for u in f{u). The chain 
rule gives 

di F{x,y,z) = f'(u)diu = 

d 2 F(x,y,z) = f'(u)d 2 u = 

d 3 F(x,y,z ) = f'(u)d 3 u = -2/'(w)^-|. 

az* 

If x = a, y = 2a, and z = a, then u = a 2 (2 a)/a 3 = 2, and 

diE(o,2a,a) = /'(2)^Mte4. 

Similarly, d 2 F(a, 2a, a) = 1 and d 3 F(a, 2a, a) = —4. 

In the notation of Theorem 2.2.6, there are three t’s: ti = x, t 2 = y, t 3 = z, and 
only one g: g(t\,t 2 ,t 3 ) = t\t 2 /(at 2 ). Then F becomes the composite function of / 
and g. ■ 


Example 2.2.8. One of the important occasions of the use of the chain rule is in 
the transformation of derivatives from Cartesian to spherical coordinates. A good 
example of such a transformation occurs in quantum mechanics where an expression 
such as xdf /dy — ydf /dx turns out to be related to angular momentum, and it is 
most conveniently expressed in spherical coordinates. In this example we go through 
the detailed exercise of converting that expression into spherical coordinates. 

We start with the transformations 

x = r sin 9 cos ip, y = r sin 9 sin ip, z = rcos9, 
and their inverse 

V 

tan ip = —. 
x 


r = v x 2 + y 2 + z 1 , 


cos 9 = 


sjx 2 + y 2 + z 2 ’ 


(2.17) 
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We shall need the derivatives of spherical coordinates with respect to x and y written 
in terms of spherical coordinates. We easily find these by differentiating both sides 
of the equations in (2.17): 


dx 


- = (^ 


x 2 + y 2 + -2 : 


9 = * 


— - (9.x) = — = sin 6 cos p, 

V® 2 + y 2 + z 2 r 


■ a" 

— sint/— = z 
ox 

2 dp 
sec p— 
ox 


2 r 3 2ir 


£2 

r 3 


sin 9 cos cos 9 


89 cos 9 cos p 
dx r 


Similarly, 


■ _ JL 

sin tp 

dp _ __ 

sin ip 

X 2 

r sin 9 cos 2 p 

dx 

r sin 9 

dr 

dy 

... 89 

sm 6 sm p. — = 
dy 

cos 9 sin p 

r 

dp 

dy 


Therefore, using the chain rule as given in Equation (2.16), we get 


how derivatives 
are transformed 
under a coordinate 
transformation 


d£ 

dx 


d / 

dy 


df dr 8f d9 df dp 

dr dx 89 dx dp dx 

. . df cos 9 cos p df sin p df 

sin 9 cos p-jr~ +-- 

dr r 89 r sm 9 dp 

df dr df 89 df dp 

dr dy 89 dy dp dy 

... df cos 9 sin p df cos p df 

sm 9 sm p ^ -2-. 

dr r 89 r sm 9 dp 


If we multiply the first of the last two equations by y = rsint?sinand subtract 
it from the second equation multiplied by x = r sin t? cos <p, the terms involving 
derivatives with respect to r and 9 cancel while the terms with p derivatives add to 
give 


9f df 
X d^- y d^ = 


df_ 

dp' 


Details are left as an exercise for the reader. 


There is a multitude of examples in thermodynamics, for which a mastery 
of the techniques of partial differentiation is essential. A property that is used 
often in thermodynamics is homogeneity of functions which we derive below. 


2.2.4 Homogeneous Functions 

A function is called homogeneous of degree q if multiplying all of its arguments 
by a parameter A results in the multiplication of the function itself by A 9 . More 
precisely, 


Box 2.2.3. We say that f(x i, X 2 , ■ ■ ■, x n ) is homogeneous of degree q 
if f{ Axi,Ax 2 ,...,Ax n ) = \ q f(x 1 ,X2,...,x n ). 
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extensive and 
intensive functions 


relation between a 
homogeneous 
function and the 
sum of its partial 
derivatives 


Two cases merit special consideration. When q = 1, the function changes 
at exactly the same rate as its arguments: Doubling all its arguments doubles 
the function and so on. Such a function is called extensive. When 9 = 0, 
the function is called intensive, and it will not change if all its arguments 
are changed by exactly the same factor. 

In many cases, we want a relation between / and its partial derivatives. 
We shall find this relation by differentiating both sides of Box 2.2.3 with 
respect to A. To avoid any confusion, let us evaluate both sides at the point 
( 6 i, 62 , • • •, b n ) after differentiation. Differentiation of the RHS is easy: 

RHS = qX q ~ 1 f(bi, 62 ,..., b n ). 


For the LHS, we first let y t = Aay for all i = 1,2,..., n —so that we have a 
single variable (one symbol) in the zth place—and note that 




df dyj 
dyi dX 


n 

i =1 


where we have used the fact that = x%- by the definition of yt. Evaluating 
the result at ay = bi, we obtain 


LHS = bidif(Xbi, A 62 ,..., A b n ). 

i =1 


Equating the LHS and the RHS, we obtain the important result 

n 

qX^fibub 2 , ...,M = E W( Xbi, X 6 a,..., A b n ) 

i=l 

This relation holds for all values of A, in particular we can substitute A = 1 to 
obtain 9 /( 61 , 62 ,..., b n ) = Y^i =1 6j<9i/(6i, 62 ,..., b n ). Keep in mind that the 
b's, although fixed, are completely arbitrary. In particular, one can substitute 
afs for them and arrive at the functional relation 

n 

q.f(xi,x 2 , ...,x n ) = y^ j x i d i f(x 1 ,X 2 , ■ ■ -,x n ). (2-18) 

i -1 


This is the relation we were looking for. 

Another important result, which the reader is asked to derive in Problem 
2.17, is 


Box 2.2.4. If f is homogeneous of degree 9 , then dif is homogeneous of 
degree q — 1 . 
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Example 2.2.9. We have already seen that the natural variables of the internal 
energy U of a thermodynamical system are entropy S, volume V. and number of 
moles N. Based on physical intuition, we expect the total internal energy, entropy, 
volume and number of moles of the combined system to be doubled when two iden¬ 
tical systems are brought together. We conclude that the internal energy function 
increases by the same factor as its arguments. A thermodynamic quantity that has 
this property is called an extensive variable. It follows that U is an extensive 
variable and a homogeneous function of degree one. 

Now consider temperature T, pressure P, and chemical potential p, which are 
all partial derivatives of U with respect to its natural variables. From Problem 
2.17, we conclude that these quantities are homogeneous of degree zero. It follows 
that, if we bring two identical systems together, temperature, pressure, and the 
chemical potential will not change, a result expected on physical grounds. Such a 
thermodynamic quantity is called an intensive variable. ■ 


2.3 Elements of Length, Area, and Volume 

We mentioned earlier the significance of the second interpretation of the 
derivative in conjunction with density. This interpretation is often used in 
reverse order, i.e., in writing the infinitesimal (element) of the physical quan¬ 
tity as a product of density and the element of volume (or area, or length). 
These elements appear inside integrals and will be integrated over (see the 
next chapter). As a concrete example, let us consider the mass element which 
can be expressed as 

volume distribution: dm{ r') = p(r')dV(v') 
surface distribution: dm( r') = cr(r') da( r') 
linear distribution: dm(v') = \(r')dl(r') 

where v' denotes the coordinates of the location of the element of mass. 

The relations above reduce the problem to that of writing the elements 
of volume, area, and length. Most of the time, the evaluation of the integral 
simplifies considerably if we choose the correct coordinate system. Therefore, 
we need these elements in all three coordinate systems. 

Basic to the calculation of all elements are elements of length in the direc¬ 
tion of unit vectors in any of the three coordinate systems. First we define 


Box 2.3.1. The primary curve along any given coordinate is the curve 
obtained when that coordinate is allowed to vary while the other two coor¬ 
dinates are held fixed. 


The primary length elements are infinitesimal lengths along the primary 
curves. By construction, they are also infinitesimal lengths along unit vectors. 
To find a primary length element at point P' with position vector r' along 


extensive and 
intensive variables 
of 

thermodynamics 
and their relation 
to homogeneous 
functions 


primary length 
elements 
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infinitesimal 

displacement 


a given unit vector, one keeps the other two coordinates fixed and allows 
the given coordinate to change by an infinitesimal amount. 10 This procedure 
displaces P' an infinitesimal distance. The length of this displacement, written 
in terms of the coordinates of P', is the primary length element along the 
given unit vector. Once the three primary length elements are found, we 
can calculate area and volume elements by multiplying appropriate length 
elements. 

A notion related to the primary length is 


Box 2.3.2. A primary surface perpendicular to a primary length is 
obtained when the coordinate determining the primary length is held fixed 
and the other two coordinates are allowed to vary arbitrarily. 


The primary element of area at a point on a primary surface is, by defini¬ 
tion, the product of the two primary length elements whose coordinates define 
that surface. 

Integrating over a primary surface of a coordinate system is facilitated if 
all boundaries of the surface can be described by qi = Ci where qi is either of 
the two coordinates that vary on the surface and c; is a constant. For example, 
the third primary surface in Cartesian coordinates is a plane parallel to the 
iry-plane. A problem involving integration on this plane becomes simplest if 
the boundaries of the region of integration are of the form, x = c\ and y = C 2 , 
i.e., if the region of integration is a rectangle. 

Finally, by taking the product of all three primary length elements, we 
obtain the volume element in the given coordinate system. 

2.3.1 Elements in a Cartesian Coordinate System 

Consider the point P' with coordinates (x',y',z') as shown in Figure 2.2. To 
find the primary length along e x > = e^,, 11 keep y' and z' fixed and let x' 
change to x' + dx'. Then P' will be displaced by dx' along e x . Thus, the 
first primary length element—denoted by dl± —is simply dx'. Similarly, we 
have d /2 = dy' , and d /3 = dz'. A general infinitesimal displacement, which is 
a vector, can be written as 

dl = e x dl\ + e y d /2 + e z d /3 = e x dx' + e y dy' + e z dz' = dr'. (2.19) 

Figure 2.2 shows that dl represents the displacement vector from P', with 
position vector r', to a neighboring point P", with position vector r". But 
this displacement is simply the increment in the position vector of P'. That is 
why dr' is also used for dl. Note that this vectorial infinitesimal displacement 

10 Usually an infinitesimal amount is expressed by a differential. Thus, an increment in x 
is simply dx. 

11 Recall that this equality holds in Cartesian—and only in Cartesian—coordinates, where 
the unit vectors are independent of the coordinates of P'. 
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Figure 2.2: Elements of length, area, and volume in Cartesian coordinates. 


includes the primary length elements as special cases: When a coordinate is 
held fixed, the corresponding differential will be zero. Thus, setting dy' = 0 = 
dz\ i.e., holding y' and z' fixed, we recover the first primary length element. 
The length of dl is also of interest: 

dl = \dl\ = ^/dlf + dl 2 2 + dl 2 3 

= \J ( dx' ) 2 + ( dy ') 2 + (dz') 2 = \J dx' 2 + dy' 2 + dz' 2 . ( 2 . 20 ) 

In one-dimensional problems involving curves, one is either given, or has 
to find, the parametric equation of a curve 7 whereby the coordinates 
(x',y',z') of a point on 7 are expressed as functions of a parameter, usually 
denoted by t. This is concisely written as 

7 (f) = (x',y',z') = (f(t),g(t),h(t)), 

so that the “curve function” 7 takes a real number t and gives three real 
numbers /(/), 5 (f), and h{t) which are the coordinates x', y', and z' of a 
point on the curve in space. Usually one considers an interval 12 (a, b ) for the 
real variable t. Then ( f(a),g(a),h(a )) is the initial point of the curve and 
(/(&), g(b), h(b)) its final point. The parameter t and the functions /, g , and 
h are not unique. For example, the three functions 

fi(t) = acosf, g±(t) = asint, hi(t)=0, 0 < t < n, 

describe a semicircle in the a;y-plane. However, 

/ 2 (f) = acos (t 3 ) , 52 (f) = asin (t 3 ) , / 12 (f) = 0 , 0 < t < ir 1 ^ 3 , 

also describe the same semicircle. This arbitrariness is useful, because it allows 
us to choose /, 5 , and h so that calculations become simple. 

12 Do not confuse this with the coordinates of a point in the plane. The notation (a, b) 
here means all the real numbers between a and b excluding a and b themselves. 


parametric 
equation of a 
curve 
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For “flat” curves [lying in the xy-plane and given by an equation y = /(&)], 
one obvious parameterization—which may not be the most convenient one—is 
x = t,y = f(t). 

Let us assume that we have chosen the three functions and they are of the 
form 

x' = f(t), y' = g{t), z' = h(t). 

Then the primary lengths can be written as 

dx' = f'(t) dt, dy' = g'{t) dt , dz' = h'(t ) dt, 


the infinitesimal 
element of 
displacement 
along a curve 


and the element of displacement along the curve becomes 

dr\t) = dl(t) = e x f(t) dt + e y g'(t) dt + e z h'(t) dt , 

\dr\t.)\ = dl(t ) = \J[f'(t) dt} 2 + [g'(t) dt} 2 + [h'(t) dt} 2 

= VWW+WW+WW dt , (2.21) 


primary surfaces 
of Cartesian 
coordinates are 
planes 


primary elements 
of area in 
Cartesian 
coordinates 

element of volume 
in Cartesian 
coordinates 


where a prime on a function denotes its derivative with respect to its argu¬ 
ment. 13 

The first primary surface at P' is obtained by holding x' constant and 
letting the other two coordinates vary arbitrarily. It is clear that the resulting 
surface is a plane passing through P' and parallel to the y 0 -plane. It is also 
clear that the first primary length element, dx 1 is perpendicular to the first 
primary surface. The first primary element of area, denoted by dai, is simply 
dy' dz'. The second and third primary surfaces are the aiz-plane and the xy- 
plane, respectively. These planes are perpendicular to their corresponding 
length elements. The primary elements of area are obtained similarly. We 
thus have 


da\ = dy' dz', da 2 = dx'dz', da% = dx' dy'. (2.22) 

Finally, the volume element is 

dV = dh dl 2 dl 3 = dx'dy'dz'. (2.23) 


2.3.2 Elements in a Spherical Coordinate System 

The point P' in Figure 2.3 now has coordinates (r / , 6', ip'). To find the primary 
length along e r >, keep 9' and ip' fixed and let r' change to r' + dr'. Then P' 
will be displaced by dr' along e r >. Thus, the first primary length element, dl\, 
is simply dr'. To find the primary length along e#', keep r' and ip' fixed, i.e., 

13 The use of primes to represent both the derivative and the coordinates of the element of 
the source (such as dm) is unfortunately confusing. However, this practice is so widespread 
that any alteration to it would result in more confusion. The context of any given problem 
is usually clear enough to resolve such confusion. 
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Figure 2.3: Elements of length, area, and volume in spherical coordinates. We have 
used “A” instead of “d." 

confine yourself to the plane passing through P' and the polar—or z —axis, 
and let 9' change to 9' + d9'. Then P' will be displaced by 14 r' d9' along 
eg:. The primary length along e v > is obtained by keeping r' and 9' fixed, 
i.e., confining oneself to a plane passing through P' and perpendicular to the 
2 -axis, 15 and letting p' change to p' + dp'. Then P' will be displaced along 
a circle of radius r' sin 9' by an angle dip'. This can be seen by noting that P' 
lies in the a;y-plane and that its distance from the 2 -axis is given by 

x ' 2 + y' 2 = ( r' sin 9' cos ip') 2 + (r 1 sin 9' sin ip') 2 = r' 2 sin 2 9' 

and that the RHS, which is the square of the radius of the circle, is a con¬ 
stant. The displacement of P' is therefore r' sin0' dip' along e v i. A general 
infinitesimal (vector) displacement can, therefore, be written as 

dr' = dl = e r : dl\ + eg: d/ 2 + ep dig, 

= e r > dr 1 + eg: r' d9' + r' sin 9' dp'. (2.24) 

Note again that this vectorial infinitesimal displacement includes the primary 
length elements as special cases. Thus, setting d9' = 0 = dp', i.e., holding 9' 
and p' fixed, we recover the first primary length element. The length of dr' 
(or dl) is 

\dr'\ = dl = \/(dr') 2 + (r' d9') 2 + (r ' sin 9' dp') 2 

= \jdr' 2 + r' 2 d9' 2 + r' 2 sin 2 9' dp' 2 . (2.25) 

14 Since r' is held fixed, P' is confined to move on a circle of radius r' , describing an 
infinitesimal arc subtended by the angle d 6'. 

15 Fixing r' and 9' fixes z' = r' cos 9' which describes a plane parallel to the .ry-plane, i.e., 
a plane perpendicular to the 2 -axis. 


general spherical 

infinitesimal 

displacement 


do not confuse 
\dr'\ with dr', they 
are not equal. 
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primary surfaces 
of spherical 
coordinates 
consist of a 
sphere, a cone, 
and a plane. 


primary elements 
of area in spherical 
coordinates 


If we know the parametric equation of a curve in spherical coordinates, 
i.e., if the coordinates r', O', and p 1 of a point on the curve can be expressed as 
functions of the parameter t, then we can find the differentials in terms of dt 
and substitute in Equation (2.25) to find an expression analogous to Equation 
(2.21). We leave this as an exercise for the reader. 

The first primary surface at P' is obtained by holding r' constant and 
letting the other two coordinates vary arbitrarily. It is clear that the resulting 
surface is a sphere of radius r' passing through P'. It is also clear that the 
first primary length element dr' is perpendicular to the first primary surface. 
It is not hard to convince oneself that the second and third primary surfaces 
are, respectively, a cone of (half) angle O', and a plane containing the z-axis 
and making an angle of c p' with the x-axis. These surfaces are perpendicular 
to their corresponding length elements. The primary elements of area are 
obtained easily. We simply quote the results: 

da\ = {r' dO'){r' sin O' dp') = r' 2 sin O' dO' dp', 

da ,2 = ( dr'){r' sin O' dp') = r' sin O' dr' dp' , (2.26) 

da% = ( dr')(r' dO') = r' dr' dO'. 


element of volume 
in spherical 
coordinates 


Finally, the volume element is 

dV = {dr')(r' dO'){r' sin O' dp') = r' 2 sin O' dr'dO'dp'. (2.27) 


Table 2.1 gathers together all the primary curves and surfaces for the 
three coordinate systems used frequently in this book. The reader is advised 
to remember that 


Box 2.3.3. All the differentials of Table 2.1 carry a prime to emphasize 
that they are evaluated at P ', the location of infinitesimal elements. 


Coordinate 

system 

Primary 

curves 

Primary 

surfaces 


1st: Straight line (x-axis) 

yz-plane 

Cartesian 

2nd: Straight line (y-axis) 

xz-plane 


3rd: Straight line (z-axis) 

xy-plane 


1st: Rays perp. to z-axis 

Cylinder with axis z 

Cylindrical 

2nd: Circle centered on z-axis 

Half-plane from z-axis 


3rd: Straight line (z-axis) 

Plane perp. z-axis 


1st: Rays from origin 

Sphere 

Spherical 

2nd: Half-circle 

Cone of half angle 0 


3rd: Circle centered on polar axis 

Half-plane from z-axis 


Table 2.1: Primary curves and surfaces of the three common coordinate systems. 
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2.3.3 Elements in a Cylindrical Coordinate System 

The coordinates of P' are now (p', p', z') as shown in Figure 2.4. To find the 
primary length along e p /, keep p' and z’ fixed and let p' change to p' + dp'. 
Then P' will be displaced by dp' along e p >. Thus, the first primary length 
element dl\ is simply dp'. To find the primary length along e^/, keep p' and z' 
fixed, i.e., confine yourself to a circle of radius p' in the plane passing through 
P' and perpendicular to the z-axis, and let p' change to cp' + dp'. Then P' 
will be displaced by p' dp' along e p i. The primary length along e,/ = e z is 16 
obtained by keeping p' and p' fixed, and letting z' change to z' + dz'. Then 
P' will be displaced by dz'. A general infinitesimal (vector) displacement can, 
therefore, be written as 

dr' = dl = e p < dl i + e v t d / 2 + e z / d /3 

= e p > dp'+ e v 'p'dp'+ e z dz'. (2.28) 

Note again that this infinitesimal displacement includes the primary length 
elements as special cases. The length of this vector is 

|dr'| = dl = \J (dp') 2 + (p 1 dp') 2 + (dz') 2 

= y/dp' 2 + p' 2 dp' 2 + dz' 2 . (2.29) 

If we know the parametric equation of a curve in cylindrical coordinates, 
i.e., if the coordinates p', p' , and z' of a point on the curve can be expressed as 



Figure 2.4: Elements of length, area, and volume in cylindrical coordinates. We have 
used “A” instead of "d." 

16 This is the only unit vector in “curvilinear coordinates” which is independent of the 
position of P'. 


general cylindrical 

infinitesimal 

displacement 
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functions of the parameter t, then we can find the differentials in terms of dt 
and substitute in Equation (2.29) to find an expression analogous to Equation 
(2.21). We leave this as an exercise for the reader. 

The first primary surface at P' is obtained by holding p' constant and 
letting the other two coordinates vary arbitrarily. It is clear that the resulting 
surface is a cylinder of radius p' passing through P'. It is also clear that the 
first primary length element dp' is perpendicular to the first primary surface. 


primary surfaces 
of cylindrical 
coordinates 
consist of a 
cylinder and two 
planes. 

The second and third primary surfaces are, respectively, a plane containing 
the 2 -axis and making an angle of p' with the auaxis, and a plane perpen¬ 
dicular to the 2 -axis and cutting it at z'. These surfaces are perpendicular to 
their corresponding length elements. The primary elements of area are again 
obtained easily, and we merely quote the results 

primary elements 
of area in 
cylindrical 
coordinates 

da\ = {p dp)(dz') = p dp dz' , 
da 2 = dp'dz', 

da^ — (dp')(p' dp') = p' dp' dp'. 

(2.30) 

element of volume 
in cylindrical 
coordinates 

Finally, the volume element is 

dV = ( dp')(p' dp')(dz') = p' dp' dp' dz'. 

(2.31) 


Table 2.2 gathers together all the elements of primary length, surface, and 
volume for the three commonly used coordinate systems. 

Example 2.3.1. Examples of elements in various coordinate systems 

(a) The element of length in the p direction at a point with spherical coordinates 
(a, 7 , p) is a sin 7 dip. Note that this element is independent of p, and for a fixed a, 
it lias the largest value when 7 = tt/2, corresponding to the equatorial plane. 

(b) The element of area for a cone of half-angle a is r sin a dr dp, because for a cone, 
9 is a constant (in this case, a). 


Coordinate 

system 

Primary 

length 

elements 

Primary 

area 

elements 

Volume 

element 

Cartesian 

(x,y,z) 

1 st: dx 

2 nd: dy 

3rd: dz 

dy dz 
dx dz 
dx dy 

dx dy dz 

Cylindrical 

(. P > Vb *) 

1 st: dp 

2 nd: pdp 

3rd: dz 

pdp dz 
dpdz 
pdp dp 

pdp dp dz 

Spherical 
(r, 9, p) 

1 st: dr 

2 nd: r dO 

3rd: r sin 6 dp 

r 2 sin 6 dd dp 
r sin 9 dr dp 
r dr d9 

r 2 sin 9 dr d9 dp 


Table 2.2: Primary length and area as well as volume elements in the three common 
coordinate systems. In almost all applications of the next chapter each of these variables 
carries a prime. 
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(c) The element of area of a cylinder of radius a is adipdz. 

(d) The element of area of a sphere of radius a is a 2 sin 9 dd dip. Note that the largest 
element of area (for given d6 and dip) is at the equator and the smallest (zero) at 
the two poles. 

(e) The element of area of a half-plane containing the z-axis and making an angle 
a with the a:-axis is dpdz, independent of the angle a. 


finding unit 
vectors without 
use of geometry! 


We want to consider u and v as coordinates and find the unit vectors corresponding 
to them using our knowledge of differentiation gained in this chapter without any 
resort to geometric arguments. 

In general, the unit vector in the direction of any coordinate variable at a point P 
is obtained by increasing the coordinate slightly (keeping other coordinate variables 
constant), calculating the displacement vector described by the motion of P, and 
dividing this vector by its length. So, consider changing u while v is kept constant. 
Call the displacement obtained Ah- Then 


Example 2.3.2. Suppose Cartesian coordinates of the plane are related to two 
other variables u and v via the formulas 

x = f(u,v), y = g(u, v). 


. - w A d f . ~ 9g . 

Ah — e* Ax -h ^yAy — ~ Au -t- Gy Au 

au au 


and 


Therefore, 




I AZi | = (Ax) 2 + (Ay) 2 



Au. 



For e„, we keep u fixed and vary v. Calling the resulting displacement Ah, we 
easily obtain 


,. Ah 

e v = inn —— 
|AZ 2 | 



Note that for general / and g, e u and e v are not perpendicular. 

The result can easily be generalized to three variables. In fact, if 


x = f(u,v,w), 


V = g(u,v,w), 


z = h(u, v, w), 




68 


Differentiation 


then, a similar calculation as above will yield 



2.4 Problems 

2.1. Find the partial derivatives of the following functions at the given points 
with respect to the given variables. In the following r = (x, y , z) and r' = 


e xyz 

with respect to x 

at 

( 1 , 0 , - 1 ), 

cos (xy/z) 

with respect to z 

at 

( tt , 1 , 1 ), 

x y + y z + z x 

with respect to y 

at 

( 1 ,- 1 , 2 ), 

1 f ax + by + cz\ 
n \x 2 + y 2 +z 2 ) 

with respect to x 

at 

(a, b, c), 

r = \J x 2 + y 2 + z 2 

with respect to x 

at 

(x,V,z), 

|r-r'| 

with respect to y 

at 

(x,y,z,x',y',z') 

1 

with respect to z' 

at 

(x,y,z,x',y',z') 

|r - r'l 


2.2. The Earth has a radius of 6400 km. The thickness of the atmosphere 
is about 50 km. Starting with the volume of a sphere and using differentials, 
estimate the volume of the atmosphere. Hint: Find the change in the volume 
of a sphere when its radius changes by a “small” amount. 

2.3. The gravitational potential (potential energy per unit mass) at a distance 
r from the center of the Earth (assumed to be the origin of a Cartesian 
coordinate system) is given by 

GM / 2 ——y 

=-, r = Wx z + y 2 + z 2 , 

r 

where G = 6.67 x 10 -11 N-m 2 /kg 2 and M = 6 x 10 24 kg. Using differentials, 
find the energy needed to raise a 10-kg object from the point with coordinates 
(4000 km, 4000 km, 3000 km) to a point with coordinates (4020 km, 4050 km, 
3010 km). 




2.4 Problems 


69 


2.4. Show that the function f(x ± ct) satisfies the one-dimensional wave 
equation: 

^!Z_l^!Z = n 

dx 2 c 2 dt 2 

Hint: Let y = x ± ct and use the chain rule. 

2.5. Assume that f" + kf = 0 and g" — kg = 0. Show that F(x,y) = f{x)g{y) 
satisfies the two-dimensional Laplace’s equation: 

d 2 F d 2 F 

2 .6. Suppose that f" — af = 0, g" — /3g = 0, and h" — 7 h = 0. Write an 
equation relating a, (3, and 7 such that the function 

F(x, y, z) = f(x)g(y)h(z) 

satisfies the three-dimensional Laplace’s equation: 

d 2 F d 2 F d 2 F 
dx 2 dy 2 dz 2 

2.7. Suppose that f" — af = 0, g" — (3g = 0, h" — 7/1 = 0 , and u' — um = 0. 
Write an equation relating a, /?, 7 , and lo such that the function 

F(x,y,z,t ) = f(x)g(y)h(z)u(t) 


satisfies the heat equation: 

a 2 F a 2 P d 2 F _ dF 
dx 2 "I" <9y 2 9z 2 9/ 

where a is a constant. 


2 . 8 . Suppose that f"+k 2 f = 0, g"+k 2 g = 0, h"+k 2 h = 0, and u"+u 2 u = 0. 
(a) Write an equation relating k x . k y . k z , and w such that the function 


F(x,y,z,t ) = f(x)g(y)h(z)u(t) 

satisfies the three-dimensional wave equation: 

d 2 F d 2 F d 2 F 1 d 2 F _ 

dx 2 9 ^ 2 c 2 dt 2 


(b) If lo is considered as angular frequency, and c as the speed of the wave, 
what is the magnitude of the vector k = (k x , k y , k z )l 


2.9. Consider the function F(x,y,z ) = / 


9 

x y 


y 2 x 


in which a is a con¬ 


stant. Assuming that /'(2) = a, find the unit vector e v in the direction of 


one-dimensional 
wave equation 


two-dimensional 
Laplace's equation 


three-dimensional 
Laplace's equation 


heat equation 


three-dimensional 
wave equation 


v = e x diF(a, a, a) + e y d 2 F(a, a, a) + e z d 3 F(a, a, a). 
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2.10. Consider the function F(x, y,z) = / 


x 3 y - y 3 z + 2 z 3 x 


in which a is 


a constant. Assuming that /'(17) = a, find the unit vector e v in the direction 
of 

v = e x d\F(a, —a, 2a) + e y d 2 F(a, —a, 2a) + e z d 3 F(a, —a, 2a). 


2.11. Given that f(x,y,z) = e W x2+ v 2 + z 2 /a : 2 d- 2/ 2 _|_ ^ 2 ^ w h ere ft is a 
constant, find the radial component (component along e r ) of the vector 


V = e x di f{x, y, z) + e y d 2 f(x , y , z) + e z d 3 f(x, y , 2 ). 


2.12. Given that 

dif(x,y,z) = d 2 f{z,x,y) = d 3 f(y,z,x) = — - 

where ft is a constant, find the function f(x,y,z). Note the order of the 
variables in each pair of parentheses. 

2.13. Given that f(x,y,z) = x 2 ysin(yz/x), find 

d 2 /( 1,1 ,tt/2), 3i/(2,tt,1), d 3 f( 4,tt,1), f(y,z,x), d 3 f(t,u,v). 

2.14. Derive the analogue of Equation (2.11) assuming this time that Q is 
held constant in all derivatives instead of W. 


2.15. Which of the following functions are homogeneous? 

iWA • X U x 2 U 2 xz 2 2 2 /x 

e v/ , xyz sm—, -cos^-, x +y —z, ax + y(z — x), 

az z y z 

where a is a constant. For those functions that are homogeneous, find their 

degree and verify that they satisfy Equation (2.18). 

2.16. Suppose / and g are homogeneous functions of degrees q and p, respec¬ 
tively. What can you say about the homogeneity of / ± g, fg, and f/g. If 
they are homogeneous, find their degree, and verify that they satisfy Equation 
(2.18). 

2.17. If / is homogeneous of degree q, show that dif is homogeneous of degree 
q — 1. Hint: Use the definition of homogeneity and differentiate with respect 

to X{. 

2.18. A function f(x, y, z) of Cartesian coordinates can also be thought of as 
a function of cylindrical coordinates p, ip, z, because the latter are functions 
of the former via the relations p = \Jx 2 + y 2 and tan ip = y/x. 

(a) Using the chain rule for differentiation, find df/dx and df/dy in terms 
of df /dp and df /dp. Express your answers entirely in terms of cylindrical 
coordinates. 
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„ 9/ „ df , df . . 

(b) Show that the vector e x —— b e,, ——be z —, when written entirely m terms 

dx dy dz 

of cylindrical coordinates and cylindrical unit vectors, becomes 


d£ 

’ dp 


1 df ^ df 


pdy> 


' dz' 


2.19. In each of the following, the partial derivative of a function is given. 
Find the most general function with such a derivative. 


(a) d 2 f(x,y,z) = xy 2 z. 
(c )d 1 h(z,x,y) = £^* 
(e) d 2 g(z,x,y) = e x y 2 . 
(g )d 3 f(x,y,z) = xy 2 z. 
(i )d 3 h(y,x,z) = £f^ 


(b) di f(x,y,z) = xy 2 z. 
(d) d\g{z, x, y) = e x y 2 . 
({)d 2 h(x,y,z) = ^± 
(h )d 3 g(z,x,y) = e x y 2 . 


2.20. Finish the calculation of Example 2.2.8. 

2.21. Find ydf/dz — zdf/dy and zdf /dx — xdf /dz in terms of spherical 
coordinates. Warning! These will not be as nice-looking as the expression 
calculated in Example 2.2.8. 


2.22. Given that /'(l) = 2, find 

df , 


df, , df. 


ry I ry I ry C 2 

ox oy oz 
for f(xyz) at the Cartesian point (—1,2,—1/2). 

2.23. Given that /'(3) = —1, find the radial component of the vector 


df, , df. 


dx 


dy 




for f^sjx 2 + y 2 + z 2 ) at the Cartesian point (2,1, —2). 

2.24. Show that the function F(kr — ojt) satisfies the three-dimensional wave 
equation: 

d 2 F d 2 F d 2 F 1 d 2 F_ 

dx 2 dy 2 + dz 2 c 2 dt 2 

if k = ( k x , k y , k z ) is a constant vector, w is a constant, and a certain relation 
exists between k = |k| and lo. Find this relation. 

2.25. In electromagnetic radiation theory one encounters an equation of the 
form 

1 

t = 


V¥ z 7W+W^gW+¥ zr WW 
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and one is interested in the partial derivative of t with respect to x, y, and 2 . 
Note the hybrid role that t plays here as both a dependent and an independent 
variable. Show that 

dt x — f(t) 

dx [x - + [y- g{t)]g'{t) + [z - h(t)]h'(t) - F 3 / 2 ’ 

where F(x, y, z, t) = [x—f(t)] 2 +[y—g(t)] 2 +[z—h(t)} 2 . Find similar expressions 
for partial derivatives of t with respect to y and z. 


2.26. Consider the function /(|r — r'|) with r = xe x + ye v + ze z and r' = 
x'e x + y'e v + z'e z being the position vectors of P and P'. 

(a) Find a general expression for the vector 


df. df. 

ox ay oz 


in terms of r and r'. 

(b) If /'(3) = 3 and the coordinates of P and P' are (1,—1,0), and (0,1,2), 
respectively, find the numerical value of V. 

2.27. Find an expression in cylindrical and spherical coordinates analogous 
to Equation (2.21). 

2.28. A function f(x,y) of Cartesian coordinates can also be thought of as a 
function of some other coordinates u and v defined by 


x = u sin v, y = u cos v. 


(a) Applying the procedure of Example 2.3.2, find the unit vectors e u and e„. 

(b) Find u and v as functions of x and y. 

(c) Calculate e x and e y in terms of e u and e„. 

(d) Write the vector 


A = 


„ df „ df 
e x ^- + e y ^~ 
ox ay 


entirely in the (u, v) coordinate system. 


2.29. Find the cylindrical unit vectors in terms of Cartesian unit vectors 
using the procedure of Example 2.3.2. 


2.30. Find the spherical unit vectors in terms of Cartesian unit vectors using 
the procedure of Example 2.3.2. 

2.31. In the first part of Example 2.3.2, assume that f(u,v) = ufi(v) and 
g(u,v) = ug\(v) where fi and g\ are functions of only one variable. 

(a) Find a relation between f\ and g\ to make e u and e v perpendicular. 

(b) Can you recover the polar coordinates as a special case of (a)? 
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2.32. The elliptic coordinates (u,9) are given by 

x = a cosh u cos 0, 
y = a sinh u sin 0. 


where a is a constant. 

(a) What are the curves of constant ul 

(b) What are the curves of constant 91 

(c) Find e u and eg in terms of the Cartesian unit vectors, and examine their 
orthogonality. 

2.33. The parabolic coordinates ( u,v ) are given by 

x = a(u 2 — v 2 ), 
y = 2 auv, 


where a is a constant. 

(a) What are the curves of constant ul 

(b) What are the curves of constant vl 

(c) Find e u and e„ in terms of the Cartesian unit vectors, and examine their 
orthogonality. 

2.34. The two-dimensional bipolar coordinates (u,v) are given by 

a sinh u 
cosh u + cos v ’ 
a sin v 

cosh u + cos v ’ 


x = 

V = 


where a is a constant. 

(a) What are the curves of constant ul 

(b) What are the curves of constant vl 

(c) Find e u and e„ in terms of the Cartesian unit vectors, and examine their 
orthogonality. 

2.35. The elliptic cylindrical coordinates (u,9,z )are given by 

x = a cosh u cos 9, 
y = a sinh u sin 9 , 
z = z, 


where a is a constant. 

(a) What are the surfaces of constant ul 

(b) What are the surfaces of constant 91 

(c) Find e u , eg, and e- in terms of the Cartesian unit vectors and examine 
their orthogonality. 
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2.36. The prolate spheroidal coordinates (u, 8, p) are given by 

x = a sinh u sin 8 cos p, 
y = a sinh u sin 9 sin p, 
z = a cosh u cos 9, 


where a is a constant. 

(a) What are the surfaces of constant it? 

(b) What are the surfaces of constant 91 

(c) Find e u , eg, and e v in terms of the Cartesian unit vectors, and examine 
their mutual orthogonality. 

2.37. The toroidal coordinates (■ u,6,p ) are given by 

a sinh u cos p 
cosh u — cos 9 ’ 
a sinh u sin p 
^ cosh 9 — cos 9 1 
a sin u 

cosh u — cos 9 ’ 


where a is a constant. 

(a) What are the surfaces of constant ul 

(b) What are the surfaces of constant 91 

(c) Find e u , eg, and e v in terms of the Cartesian unit vectors, and examine 
their mutual orthogonality. 

2.38. The paraboloidal coordinates ( u,v,p ) are given by 

x = 2auvcos p, 
y = 2auv sin p, 
z = a(u 2 - v 2 ), 


where a is a constant. 

(a) What are the surfaces of constant ul 

(b) What are the surfaces of constant vl 

(c) Find e u , e v , and in terms of the Cartesian unit vectors, and examine 
their mutual orthogonality. 

2.39. The three-dimensional bipolar coordinates ( u,9,p ) are given by 

a sin 9 cos p 
cosh u — cos 9 ’ 
a sin 9 sin p 
^ cosh u — cos 8 ’ 
a sinh u 
cosh u — cos 9 ’ 
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where a is a constant. 

(a) What are the surfaces of constant ul 

(b) What are the surfaces of constant 91 

(c) Find e„, eg, and e v in terms of the Cartesian unit vectors, and examine 
their mutual orthogonality. 

2.40. A coordinate system {R, 0, <j>) in space is defined by 

x = R cos 0 cos <j) + b cos </>, 
y = R cos 0 sin <f> + b sin (/>, 
z = A sin 0, 

where b is a constant, and 0 < R < b. 

1. Express the unit vectors e^, e©, and in terms of Cartesian unit 
vectors with coefficients being functions of (R, 0, <j>). 

2 . Are unit vectors mutually perpendicular? 




Chapter 3 


Integration: Formalism 


It is not an exaggeration to say that the most important concept, whose mas¬ 
tery ensures a much greater understanding of all undergraduate physics, is the 
concept of integral. Generally speaking, physical laws are given in local form 
while their application to the real world requires a departure from locality. 
For instance, Coulomb’s law in electrostatics and the universal law of gravity 
are both given in terms of point particles. These are mathematical points and 
the laws assume that. In real physical situations, however, we never deal with 
a mathematical point. Usually, we approximate the objects under considera¬ 
tion as points, as in the case of the gravitational force between the Earth and 
the Sun. Whether such an approximation is good depends on the properties 
of the objects and the parameters of the law. In the example of gravity, on 
the sizes of the Earth and the Sun as compared to the distance between them. 
On the other hand, the precise motion of a satellite circling the earth requires 
more than approximating the Earth as a point; all the bumps and grooves of 
the Earth’s surface will affect the satellite’s motion. 

This chapter is devoted to a thorough discussion of integrals from a phys¬ 
ical standpoint, i.e., the meaning and the use of the concept of integration 
rather than the technique and the art of evaluating integrals. 


3.1 “J” Means “Jum” 

One of the first difficulties we have to overcome is the preconception instilled 
in all of us from calculus that integral is “area under a curve.” This pre¬ 
conception is so strong that in some introductory physics books the authors 
translate physical concepts, in which integral plays a natural role, into the 
unphysical and unnatural notion of area under a curve. It is true that calcula¬ 
tion of the area under a curve employs the concept of integration, but it does 
so only because the calculation happens to be the limit of a sum, and such 
limits find their natural habitat in many physical situations. 


Physical laws are 
given for 
mathematical 
points but applied 
to extended 
objects. 


Integral is not just 
area under a 
curve! 
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calculation of 
fields of 
continuous 
distribution of 
sources as the 
natural setting for 
the concept of 
integral 


Take the gravitational force, for example. As a fundamental physical law, 
it is given for point masses, but when we want to calculate the force between 
the Earth and the Moon, we cannot apply the law directly because the Earth 
and the Moon cannot be considered as points with the Moon being only 60 
Earth radii away. This problem was recognized by Newton who found its solu¬ 
tion in integration. Inherent in the concept of integration is the superposition 
principle whereby, as mentioned in Chapter 1, different parts of a system are 
assumed to act independently. Thus a natural procedure is to divide the 
big Earth and the big Moon into small pieces, write down the gravitational 
force between these small pieces, invoke the superposition principle, and add 
the contribution of these pieces to get the total force. Now, nothing is more 
natural than this process, and no example is a more illustrative example of 
integration than such a calculation. 

In order to define and elucidate the concept of integration, 1 let us recon¬ 
sider the gravitational field of Box 1.3.5. Instead of a known collection of point 
masses, let us calculate the gravitational field at a point P of a continuous 
distribution of mass such as that distributed in the volume of the Earth. 
The point P is called the field point. 2 We divide the large mass into N 
pieces, denoting the mass of the zth piece, located around the point Pi, by 
A rtii as shown in Figure 3.1. To be able to even write the field equation for 
the ith piece of mass, we have to make sure that the size of Am, is small 
enough. We thus write 


Si ~ -1 


GArrii 


(r-rO- 




Figure 3.1: The mass distribution giving rise to a gravitational field, (a) The mass is 
divided into discrete pieces labeled 1 through N with the zth piece singled out. (b) The 
mass is divided into infinitesimal pieces with the piece located at x' singled out. 


■^The discussion that follows may seem specific to one example, but in reality, it is much 
more general. Instead of the gravitational law one can substitute any other local law, and 
instead of mass, the appropriate physical quantity must be used. The examples that follow 
throughout this chapter will clarify any vague points. 

2 The same terminology applies to electrostatic fields as well. 
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The smaller the size, the better this expression approximates the field due to 
A nii. Invoking the superposition principle, we write 


N 

g(r) « g* = 

i =1 


E 


i= 1 


GArrii 
|r — r j | 3 


(r 


r i) 


E 


i=1 


GAm(ri) 

|r-r 4 | 3 


(r - 


i), 


(3.1) 


where in the last equality we have replaced Ami with Am(r,). Aside from a 
change in notation, this replacement emphasizes the dependence of the small 
piece of mass on its “location.” The quotation marks around the last word 
need some elaboration. In any practical slicing of the gravitating object, such 
as the Earth, each piece still has some nonzero size. This makes it impossible 
to define the distance between Ami and the point P. We can define this 
distance to be that of the “center” of Ami from P, but then the difficulty 
shifts to defining the center of the piece. Fortunately, it turns out that, as long 
as we ultimately make the size of all Atom’s indefinitely small, any point—such 
as Pi shown in the figure—in Am,; can be chosen to define its distance from 
P. We are thus led to taking the limit of Equation (3.1) as the size of all 
pieces tends to zero, and, necessarily, the number of pieces tends to infinity. 

If such a limit exists, we call it the integral of the gravitational field and integral as the 
denote it as follows: 3 limit of a sum 


N 


;(r) = — lim > 


Am—>0 z —' ir — r 
N—*oo i—1 


< 3 - 2 > 


An identical procedure leads to a similar formula for the electrostatic field 
and potential: 


E = f tAM (r _ r '), 


r — r 


/13 v 


II' k e dq( r' 

Jlo |r — r'| ' 


(3.3) 


Equations (3.2) and (3.3) will be used frequently in the sequel as we try 
to illustrate their use in various physical situations. Note that Equations 
(3.2) and (3.3) are independent of any coordinate systems as all physical laws 
should he. 

In the symbolic representation of integral on the RHS, Cl, called the region 
of integration, 4 is the region—for example, the volume of the Earth—in 
which the mass distribution resides, and dm(r') is called an element of mass 
located 5 at point P' whose position vector is r'. P' is called the source 
point because it is the location of the source of the gravitational field, i.e., 
the mass element at that point. We also call it the integration point. The 

3 We shall use the symbol jf\, (or simply J(,) to indicate general integration without 
regard to the dimensionality (single, double, or triple) of the integral. 

4 When the region of integration is one dimensional, such as an interval (a, b) on the real 

line, one uses f b instead of f, ... 

’ Ja J (a,b) 

5 Whenever r' is used as an argument of a quantity, it will refer to the coordinates of a 
point not the components of its position vector. 


region of 
integration 


integration point, 
integration 
variables, and 
integrand defined 
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integration 

parameters 


coordinates of r' upon which the mass element depends—and in terms of 
which it will eventually be expressed—are called the integration variables, 
and whatever multiplies the products of the differentials of these variables is 
called the integrand. 

It is not hard to abstract the concept of integration from the specific 
example of gravity. Instead of the specific form of the integral in Equations 
(3.2) and (3.3), we use /( r, r'), and instead of the element of mass, we use the 
element of some other quantity which we generically designate as dQ(v'). We 
thus write 


/l ( r ) = Ac m nE^ r ’ rj ) A( 3( ri ) E / /(r,r')dQ(r'), (3.4) 

N —»oo 1 —1 


where h( r), the result of integration, will be a function of r, the position 
vector of P whose coordinates are called the parameters of integration. 
Although we have used r and r', the concept of integration does not require 
the parameters and integration variables to be position vectors. They could be 
any collection of parameters and variables. Nevertheless, we continue to use 
the terminology of position vectors and call such collections the coordinates 
of points. 

An immediate—and important—consequence of the definition of integral 
is that if the region of integration f1 is small, then, for practical calculations, 
we do not need to subdivide it into many pieces. In fact, if 12 is small enough, 
only one piece may be a good approximation to the integral. We thus write 


If /(r,r , )dQ(r / ) «/(r,r M ) AQ, (3.5) 

jj An 

where it is understood that AO is a small region around point M whose 
“position vector” is ym- 

Another immediate and important consequence of the definition of integral 
is that if O is divided into two regions Oi and O 2 , then 


/ /(r, r') dQ(v') = f /( r, r') dQ{ r') + i /( r, r') dQ( r') (3.6) 

In order to be able to evaluate integrals, one has to express both dQ(r') 
and /( r, r') in terms of a suitable set of coordinates. /( r, r') poses no problem, 
and in most cases it involves a mere substitution. The element of Q , on the 
other hand, is often related, via density, to the element of volume (or area, 
or length) whose expression is more involved. Section 2.3 dealt with the 
construction of elements of length, area, and volume in the three coordinate 
systems. 


Integral calculus, in its geometric form, was known to the ancient Greeks. For 
example, Euclid, by adding pieces to the area of a square inscribed in a circle, 
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constructing newer polygons of larger numbers of sides, and continuing the process 
until the circle is “exhausted” by regular polygons, proved the theorem: Circles 
are to one another as the squares on the diameters. In essence, Euclid thinks of a 
circle as the limiting case of a regular polygon and proves the above theorem for 
polygons. Then he uses the argument of “exhaustion” to get to the result. Although 
mathematicians of antiquity made frequent use of the method of exhaustion, no one 
did it with the mastery of Archimedes. 

Archimedes is arguably believed to be the greatest mathematician of antiquity. 
The son of an astronomer, he was born in Syracuse, a Greek settlement in Sicily. 
As a young man he went to Alexandria to study mathematics, and although he 
went back to Syracuse to spend the rest of his life there, he never lost contact with 
Alexandria. 

Archimedes possessed a lofty intellect, great breadth of interest—both theoret¬ 
ical and practical—and excellent mechanical skills. He is credited with finding the 
areas and volumes of many geometric figures using the method of exhaustion, the 
calculation of 7r, a new scheme of presenting large numbers in verbal language, find¬ 
ing the centers of gravity of many solids and plane figures, and founding the science 
of hydrostatics. 

His great achievements in mathematics—he is ranked with Newton and Gauss 
as one of the three greatest mathematicians of all time—did not overshadow his 
practical inventions. He invented the first planetarium and a pump (Archimedean 
screw). He showed how to use levers to move great weights, and used compound 
pulleys to launch a galley of the king of Syracuse. Taking advantage of the focusing 
power of a parabolic mirror, so the story goes, he concentrated the Sun’s rays on 
the Roman ships besieging Syracuse and burned them! 

Perhaps the most famous story about Archimedes is his discovery of the method 
of testing the debasement of a crown of gold. The king of Syracuse had ordered 
the crown. Upon delivery, he suspected that it was filled with baser metal and 
sent it to Archimedes to test it for purity. Archimedes pondered about the problem 
for some time, until one day, as he was taking a bath, he observed that his body 
was partly buoyed up by the water and suddenly grasped the principle—now called 
Archimedes’ principle —by which he could solve the problem. He was so ex¬ 
cited about the discovery that he forgetfully ran out into the street naked shouting 
“Eureka!” (“I have found it!”). 



Archimedes 
287-212 B.C. 


3.2 Properties of Integral 

Now that we have developed the formalism of integration, we should look 
at some applications in which integrals are evaluated. As we shall see, all 
integral evaluations eventually reduce to integrals involving only one variable. 
Thus, it is important to have a thorough understanding of the properties 
of such integrals. Some of these properties are familiar, others may be less 
familiar or completely new. We gather all these properties here for the sake 
of completeness. 
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Feel free to use 
any symbol you 
like for the 
integration 
variable! 


3.2.1 Change of Dummy Variable 

The symbol used as the variable of integration in the integral is completely 
irrelevant. Thus, we have 


rt2 pt2 rt2 

/ g(t) dt = I g(x) dx = / g(s) ds 

Jtx Jtx Jtx 


= [ t2 g(t')dt'= [ t2 g(+)d+. 

Jt i Jti 

Note how the limits of integration remain the same in all integrals. The fact 
that these limits use the same symbol as the first dummy variable should not 
confuse the reader. What is important is that they are fixed real numbers. 


piecewise 

continuous 

functions 


3.2.2 Linearity 

For arbitrary constant real numbers a and 6, we have 

[ l a f(t) + b 9(t)\ dt = a f f(t) dt + b f g(t) dt. 

J Cl j Cl j Cl 

3.2.3 Interchange of Limits 

Interchanging the limits of integration introduces a minus sign: 


J f(t ) dt = - J f(t ) dt. 


This relation implies that f s s f(t)dt = 0. (Show this implication!) 

3.2.4 Partition of Range of Integration 

If q is a real number between the two limits, i.e., if p < q < r, then 


f f(t)dt= j f (t) dt + j f(t) dt. 

J p J p J a 


which is a special case of Equation (3.6). This property is used to evaluate 
piecewise continuous functions, i.e., functions that have a finite number 
of discontinuities in the interval of integration. For instance, suppose f(t) is 
defined to be 

if p<t<qi, 
f(t ) = < f 2 (t) if qi < t < q 2 , 

[/3(f) if <Z2 < t < r, 

where fi(t), f 2 (t), and /3(f) are, in general, totally unrelated (continuous) 
functions. Then one divides the interval of integration into three natural 
parts and writes 

l-r rqi rq 2 rr 

/ fit) dt = / fi(t)dt+ / f 2 (t)dt+ / f 3 (t)dt. 

J P J p J Oi J (7p 
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Figure 3.2: The integral is defined as long as there is only a finite number of disconti¬ 
nuities (jumps) in the function. 


This is illustrated in Figure 3.2. 


3.2.5 Transformation of Integration Variable 

When evaluating an integral it is sometimes convenient to use a new variable 
of integration of which the old one is a function. Call the new integration 
variable y and assume that t = h{y). Then we have 

f f(t) dt = f f(h(y)) h'(y) dy, (3.9) 

J a J p 


where p and q are the solutions to the two equations 

a = h(p ), b = h(q). 

Each of these two equations must have a unique solution, otherwise, the trans¬ 
formation of the integration variable will not be a valid procedure. This con¬ 
dition puts restrictions on the type of function h can be. Note that we have 
essentially substituted h(y) for t in the original integral including the dif¬ 
ferential h'(y) dy for dt. It is vital to remember to change the limits of 
integration when transforming variables. 


Transformation of 
integration 
variable 
accompanies a 
change in the 
limits of 
integration. 


3.2.6 Small Region of Integration 


When the region of integration is small, in the sense that the integrand does 
not change much over the range of integration, then the integral can be ap¬ 
proximated by the product of integrand and the size of the range. 6 We thus 
can write 

r-b 


/ f(t)dt^ (b-a)f(t 0 ), 


(3.10) 


When is the 
region of 
integration small? 


where to is a number between a and b, mostly taken to be the midpoint of 
the interval (a, b). 


This is simply a restatement of Equation (3.5) for the case of one variable. 
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3.2.7 Integral and Absolute Value 


A useful property of integrals that we shall be using sometimes is 


f{t) dt 


< f \f{t)\ dt. 

J a 


(3.11) 


This should be clear once we realize that an integral is the limit of a sum and 
the absolute value of a sum is always less than or equal to the sum of the 
absolute values. 


even and odd 
functions defined 


3.2.8 Symmetric Range of Integration 

By a symmetric range of integration, we mean a range that has 0—the origin— 
as its midpoint. For certain functions, partitioning such a range into two equal 
pieces can simplify the evaluation of the integral considerably. So, let us write 

[ f(t ) dt = f f(t ) dt + f f{t) dt. 

J-T J-T JO 

For the first integral, make a change of variable t = — y to obtain 
h(y) = -y => h'{y) dy = (-1) dy = -dy. 

The limits of integration in y are determined by 

T') = //lower? ^(0) = Supper ^ 2/lower — H”T, //upper = d. 

We therefore have 

0 0 j r J^ t j rjp 

f f (t) dt = [ f(-y)(-dy)= [ f(-y)dy= [ f(~t) dt, 

J-T J+T Jo Jo 

where we have used the properties in Subsections 3.2.3 and 3.2.1. Combining 
our results and using the second property, we get 


f +T mdt= f +T f(-t)dt+ f +T m 

J-T Jo Jo 


dt 


r+T 


I/O) + /M)] dt. 


(3.12) 


A real-valued function / is called even if f(—x) = f{x), and odd if 
f(—x) = —f(x). Thus, from Equation (3.12), we obtain 

_|__| _ r J^ j rj~i 

[ f(t)dt= [ [f(t) + dt = 2 f f (t) dt (3.13) 

J-T Jo Jo 


l-T 

when / is even, and 


[ f(t)dt= [ [/(*) + dt = 0 (3.14) 

J-T Jo 


when it is odd. 
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3.2.9 Differentiating an Integral 

We have seen that an integral can have an integrand which depends on a set of 
parameters, and that the result of integration will depend on these parameters. 
Thus, we can think of the integral as a function of those parameters, and in 
particular, we may want to know its derivative with respect to one of the 
parameters. Using the definition of integral as the limit of a sum, and the 
fact that the derivative of a sum is the sum of derivatives, it is easy to show 
that 


_d_ 

dxi 


f b f b d 

J f(xi,x 2 ,...,x n ,t)dt = J —f(xi,X2,...,x n ,t)dt, (3.15) 


where we have represented the list of parameters as (xi, x 2 , ■ ■ ■, x n ). We can 
write exactly the same relation for the integral of Equation (3.4). Assuming 
that r = (xi, x 2 ,..., x n ), we have 


_d_ 

dxi 


n{ r >= r '>= LiL, ,{ - ry)iQ( r,) - (3 - i6) 


In both cases the region of integration is assumed to be independent of x % . 

Restricting ourselves to single integrals, 7 we now consider the case where 
the limits of integration depend on some parameters. First, consider an inte¬ 
gral of the form 

[ f(t ) dt 

J U 

and treat the result as a function of the limits. So, let us write 
F(u,v) = [ f(t)dt F(v,u) = —F(u,v) 


and evaluate the partial derivative of F with respect to its arguments: 


dF 

—— = d\ F(u,v) = lim 
cm e —*0 


F(u + e,f)— F(u, v) 


lu+e /W dt - In /(*) dt 


= lim Ju±e J w ^ Ju J w — = _ Um Jufitfdt + Jv 


e—^0 


= — lim 

e —»0 


C e m dt 


= — lim 

e —^0 


e— 

e/(wo) 


= -lim/(u 0 ) = 

€—^ 0 


The last equality follows from the fact that as e —* 0, uo, lying between u and 
u + e, will be squeezed to u. Note that the derivative above is independent of 
the second variable. To find the other derivative, we use the result obtained 
above and simply note that 


d F(u,v) dF(v,u) 


dv 


dv 


= -<9i F(v,u) = -(-/ {v)) = f{v). 


7 Since all multiple integrals are reducible to single integrals, this restriction is not severe. 
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Putting these two results together, we can write 

i; L m dt = !(v) - t L m dt = - /(u) - (3 ’ 17) 

In words, 


total derivative 


Box 3.2.1. The derivative of an integral with respect to its upper (lower) 
limit equals the integrand (minus the integrand) evaluated at the upper 
(lower) limit. 


By evaluation, we mean replacing the variable of integration. If the integrand 
has parameters, they are to be left alone. 

By combining Equations (3.15) and (3.17) we can derive the most general 
equation. So, assume that both u and v are functions of (xi,X 2 ,..., x n ), and 
write 

pv(x ly x 2 ,...,X n ) 

G(xi,x 2 , • ..,x n ,u,v) = / f(x 1 ,x 2 ,...,x n ,t)dt. 

J u(x i,X2,...,X n ) 


Then, using the chain rule, we get 

dG du 
du dxi 


D,G = 


dG dv 
dv dxi 


diG , 


where DiG stands for the “total” derivative with respect to This means 
that the dependence of u and v on x t is taken into account. In contrast, diG 
is evaluated assuming that u and v are constants. We note that 


dG 

d 

r 


du 

du . 

/ f{xi,x 2 ,.. 

' u 

• ) % n 

dG 

d 

nv 


dv 

dv d 

/ f(x 1,X2,.. 

’ U 

• > 

dG 

d 

r 


dxi 

dxi 

/ f(x 1,X2,. 
J u 

• • 7 


where the partial derivative in the last equation treats u and v as constants. 
It follows that 


Box 3.2.2. The most general formula for the derivative of an integral is 


d 

dxi 






where r = (xi, X 2 , ■ ■ ■, x n ). 


As indicated in Equation (2.16), it is common to ignore the difference between 
Di and dp, and the formula in Box 3.2.2 reflects this. 
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3.2.10 Fundamental Theorem of Calculus 

A special case of Box 3.2.2 is extremely useful. Consider a function g of 
a single variable x. We want to find a function called the primitive, or 
antiderivative, or indefinite integral 8 whose derivative is g. This can be 
easily done using integrals. In fact using Box 3.2.2, we have 

f x dC d r x 

G(x)= / g(s)ds ^ ^ = — / g(s)ds = g(x), (3.18) 

where a is an arbitrary constant. We can add an arbitrary constant to the 
RHS of the above equation and still get a primitive. Adding such a constant, 
evaluating both sides at x = a, and noting that the integral vanishes, we find 
that the constant must be G(a). We, therefore, obtain 

G(x) — G(a) = f g(s)ds. (3.19) 

J a 

Now suppose that F(x) is any function whose derivative is g(x). Then, 
from Equation (3.18), we see that 

-*(*)■-»(*)-' 0 - 

Therefore, G(x) — F(x) must be a constant C. It now follows from (3.19) that 
F(x) - F(a) = G(x) -C- [G(a) - G\ = G{x) - G{a) = t g{s ) ds, 

J a 

and we have 


Box 3.2.3. ( Fundamental Theorem of Calculus). 

Let F(x) be any 1^ 

primitive of g(x) defined in the interval (a,b), i.e., any function whose 

derivative is g(x) in that interval. Then, 


F(b) - F(a) = f g(s) ds. 

J a 

(3.20) 


The founders of calculus such as Barrow, Newton, and Leibniz thought of 
an integral as a sum. At the beginning no connection between integration and 
differentiation was established, and to obtain the result of an integral one 
had to go through the painstaking process of adding the terms of a (infinite) 
sum. It was later, that the founders of calculus realized (but did not prove) 
that the process of summation and taking limits was intimately connected 

®We would like to emphasize the concept of integral as the limit of a sum. Therefore, 
we think it is better to reserve the word “integral” for such sums and will avoid using the 
phrase “indefinite integral.” 


primitive 

(antiderivative) of 
a function 


fundamental 
theorem of 
calculus 


Connection 
between integrals 
and antiderivatives 
was not apparent 
at the time 
integration was 
introduced. It was 
discovered later. 
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using derivative of 
integral to obtain 
new integral 
formulas from 
known integral 
formulas 


to the process of (anti) differentiation. In this respect, Equation (3.20) is 
indeed a fundamental result, because it eliminates the cumbersome labor of 
summation. 

Another useful result is 

/ x f x dC f x 

g(s) ds = J -j—(s)ds = J dG. (3-21) 


In words, the integral of the differential of a physical quantity is equal to the 
quantity evaluated at the upper limit minus the quantity evaluated at the lower 
limit. 


Example 3.2.1. The properties mentioned above can be very useful in evaluating 
some integrals. Consider the integral e~ t dt whose value is known to be y/n (see 
Example 3.3.1). We want to use this information to obtain the integral ff° t 2 e _i dt. 


First, we note that 


/: 


This can be shown readily by changing the variable of integration to u = yfx t and 
using the result of Example 3.3.1. Next, we differentiate both sides with respect to 
x and use Box 3.2.2 with u = —oo and v = oo. We then get 


LHS = 


for the LHS, and 


d_ 

dx 

d 


f e~ xt2 dt= j 
J —oo J —c 


d_ — xt? 

dx £ 


l 


dt = I (—t 2 )e xt dt 


dx\ x 


1 \pa 

2 x 3 / 2 


for the RHS. So 


/: 


t 2 e xt dt = y/n^x 3 ^ 2 


(3.22) 


or, setting x = 1, f 2 e _t dt = 

We can obtain more general results. Differentiating both sides of Equation 
(3.22), we obtain 


j: 


1'3 _ 5/2 


i e xt dt = |x ' = ypR-^-x 


Continuing the process n times, we obtain 

t ” 3 . 5 ...( 2 „- 1 ) |a ^, )/3 

/ v 2" 

J —oo 

In particular, if x = 1, we have 

-1 • 3 • 5- • • (2n - 1) 


(3.23) 


£ 


t 2n e t dt = \fx- 


Example 3.2.2. Integrals involving only trigonometric functions are easy to 
evaluate: 


/ 

L 


b | b 

sin tdt = — cost = cos a — cos b, 

I a 

6 l 6 

cos tdt = sin f = sin b — sin a. 
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However, integrals of the form I = f b t n sin t dt, in which n is a positive integer, 
are not as easy to evaluate although they occur frequently in applications. One can 
of course evaluate these integrals using integration by parts. But that is a tedious 
process. A more direct method of evaluation is to use the ideas developed above. 

A pair of slightly more complicated trigonometric integrals which will be useful 
for our purposes is 


/: 

/: 


sin stdt = -cos st 

s 


cos stdt = - sin st 
s 


cos sa — cos sb 


sin sb — sin sa 


(3.24) 


If we differentiate both sides with respect to s, we can obtain the integrals we are 
after. 9 This is because each differentiation introduces one power of t in the integrand. 
For example if we are interested in I with n = 1, then we can differentiate the second 
equation in (3.24): 

d f b f b d 

LHS =— / cos stdt= / —(cos st) dt 
dsj a J a ds 

On the other hand, 


l 


= — t sin st dt. 


Setting s 


d ( sin sb — sin sa 
JtxJtl O — “— 

OS 


sin sb — sin sa b cos sb — a cos sa 

-o-1-• 


1 in these equations yields 

rb 


j 

J a 


t sin tdt = sin b — sin a — b cos b + a cos a. 


(3.25) 


We can also find the primitive of functions of the form a:™ sin a;. All we need to 
do is change b to x as suggested by Equation (3.18). For example, the primitive 
(indefinite integral) of a: sin a: is obtained by substituting x for b in Equation (3.25): 


J x sin x dx = sin x — sin a — x cos x + a cos a = sin x — x cos x + C 


because — sin a + a cos a is simply a constant. 


After a lull of almost two millennia, the subject of “exhaustion,” like any other form 
of human intellectual activity, was picked up after the Renaissance. Johannes Kepler 
is reportedly the first one to begin work on finding areas, volumes, and centers of 
gravity. He is said to have been attracted to such problems because he noted the 
inaccuracy of methods used by wine dealers to find the volumes of their kegs. 

Some of the results he obtained were the relations between areas and perimeters. 
For example, by considering the area of a circle to be covered by an infinite number 
of triangles, each with a vertex at the center, he shows that the area of a circle is | 

9 We can set s = 1 at the end if need be. 
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Bonaventura 

Cavalieri 

1598-1647 



John Wallis 
1616-1703 


its radius times its circumference. Similarly, he regarded the volume of a sphere as 
the sum of a large number of small cones with vertices at the center. Since he knew 
the volume of each cone to be | its height times the area of its base, he concluded 
that the volume of a sphere should be | its radius times the surface area. 

Galileo used the same technique as Kepler to treat the uniformly accelerated 
motion and essentially arrived at the formula x = ^at 2 . They both regarded an 
area as the sum of infinitely many lines, and a volume as the sum of infinitely many 
planes, without questioning the validity of manipulating infinities. Galileo regarded 
a line as an indivisible element of area, and a plane as an indivisible element of 
volume. 

Influenced by the idea of “indivisibles,” Bonaventura Cavalieri, a pupil of 
Galileo and professor in a lyceum in Bologna, took up the study of calculus upon 
Galileo’s recommendation. He developed the ideas of Galileo and others on indivis¬ 
ibles into a geometrical method and in 1635 published a book on the subject called 
Geometry Advanced by a thus far Unknown Method, Indivisible of Continua. 

Cavalieri joined the religious order Jesuati in Milan in 1615 while he was still 
a boy. In 1616 he transferred to the Jesuati monastery in Pisa. His interest in 
mathematics was stimulated by Euclid’s works and after meeting Galileo, considered 
himself a disciple of the astronomer. The meeting with Galileo was set up by Car¬ 
dinal Federico Borromeo who saw clearly the genius in Cavalieri while he was at the 
monastery in Milan. 

Cavalieri was largely responsible for introducing logarithms as a computational 
tool in Italy. The tables of logarithms which he published included logarithms of 
trigonometric functions for use by astronomers. Cavalieri also wrote on conic sec¬ 
tions, trigonometry, optics, and astronomy. He showed by his methods of indivisibles 
that, in the modern notation, 


f 


-,n+1 


x n dx = 


n + 1 

for positive integral values of n up to 9. 

The next important step in the development of integral calculus began when 
the seventeenth-century mathematicians generalized the Greek method of exhaus¬ 
tion. Whereas this method requires different rectilinear approximation for different 
geometrical figures, the new generation of mathematicians approximated the area 
under any curve by a large number of rectangles of equal width (much like it is 
done today), summed up the areas, and neglected the “small corrections” in the 
sum. Using essentially this kind of summation technique, Fermat showed the above 
integral formula for all rational n except —1 before 1636. 

Before Newton and Leibniz, the man who did most to replace the geometrical 
techniques with analytical ones in calculus was John Wallis. Although he did 
not begin to learn mathematics until he was about twenty, he became professor 
of geometry at Oxford and the ablest British mathematician of the century, next 
to Newton. One of Wallis’s notable results, obtained while he was trying to find 
the area of a circle analytically, was a new formula for n. He calculated the area 


bounded by the axes and the curves y = (1 — x 2 ) n for n = 0,1,2, 


Then by 


interpolation and further complicated reasoning he related the area of a unit circle 
y = (1 — x - 2 ) 1 / 2 to the previous areas and showed that 


7T 2.2.4.4.6.6.8.8 ... 
2 - 1.3.3.5.5.7.7.9 ... 
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3.3 Guidelines for Calculating Integrals 

The number of situations in which integrals are used is unlimited, and we shall 
see many examples of such usage in this chapter and throughout the book. Be¬ 
fore embarking on specific examples, let us summarize some guidelines which 
will be helpful in applying integrals in physical problems: 

• Make sure you understand what physical quantity you are trying to 
calculate. Instead of searching randomly for formulas, think about the 
problem and let it determine the formulas. 

• Determine which coordinate system is most suited for the problem. 
Then place the origin and orient the axes in such a way that the prob¬ 
lem takes the simplest form. Usually spherical coordinates are suited 
for regions of integration which are symmetric about a single point. If 
there is a natural “axis” associated with the problem, then cylindrical 
coordinates are useful, and if the region of integration is in the shape of 
a rectangular box, Cartesian coordinates may be most suitable. If there 
is no obvious symmetry, then any one of the systems is just as good (or 
just as bad). 

• Write down the local formula first, i.e., confine the problem to a small 
region and write the formula, for instance, in terms of dQ(r'), dm{v'), 
etc., then put the formula inside the integral. Do this in a coordinate- 
independent way first. All physical laws are written with no reference 
to a particular coordinate system, anyway. 

• Now express the formula in terms of the coordinates you have chosen. 
When dealing with vector quantities, pay particular attention to unit 
vectors whose directions depend on the integration point. They cannot 
in general be taken out of the integral sign (see Section 3.3.2 for details). 

• Determine the limits of integration. In a typical situation, if you have 
chosen a good coordinate system, placed the origin properly, and ori¬ 
ented the axes nicely, then the limits of integration should be easy to 
write. 

• Never take anything out of the integral unless you are absolutely sure 
that it is independent of the integration variables. This is easily said, 
but most often also easily forgotten. 

• Once you have evaluated the integrals and found the physical quantity 
you are after, try to express your result in a coordinate-free language. 
This is not, in general, easy, but in special circumstances you can im¬ 
mediately guess the coordinate-free form of the result. 

• As a general rule—valid in all physical calculations—check your final 
answer for correct dimensions. The dimension of the LHS must match 
that of the RHS. 


Let the problem 

determine 

formulas! 


Choose 
coordinates, 
origin, and 
orientation of axes 
wisely! 


Write the local 
formula, then put 
it inside the 
integral. 


Never take 
anything out of 
the integral 
unless. . . . 


Always check the 
dimension of your 
final result! 
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3.3.1 Reduction to Single Integrals 

Most integrals encountered in physics are multidimensional. Thus, it is impor¬ 
tant to know how to evaluate multiple integrals. Let us concentrate on triple 
integration, and for definiteness, let us assume that the integration variables 
are actual coordinates in a Cartesian coordinate system . 10 The most general 
integral, namely Equation (3.4), will then be rewritten as 

If f(r,r')dQ(r')= i f(r, x', y’, z’) dQ(x’, y\ z') 

Jj£l J/Q 

= III ^ T,X ' ,V '' Z ^ PQ ^ X ' ,V ' ,Z '^ dx ' dy ' dz ’’ 

v 

where we have reexpressed dQ in terms of some density. The region of integra¬ 
tion V may have to be divided into a number of other more easily integrable 
regions. However, in most applications, by a good choice of the order of inte¬ 
gration , one can avoid such division. Let us assume that by integrating the 
z' variable first, we will not need to divide the region. The z' integration is a 
single integral and is done by keeping x' and y' constant. To find the upper 
limit of this integral, we pick an arbitrary point 11 in the region, fix its first 
two coordinates, move “up” until we hit the boundary of V at a point. The 
third coordinate of this boundary point, when expressed in terms of x' and 
y' , will be the upper limit of the z' integration. The lower limit is obtained 
similarly. In most cases, V is bounded by a given upper surface of the form 
z = g(x, y), and a lower surface of the form z = h{x, y) as shown in Figure 3.3. 



Figure 3.3: The limits of the first integration of a triple integral are defined by two 
surfaces. 


10 Recall that the integration variables, although considered as “coordinates of a point,” 
need not be an actual geometric point in space. They could, for instance, be a set of 
thermodynamical variables describing a thermodynamical system. 

11 A common mistake at this stage is to pick a special point. To make sure that you have 
picked an arbitrary point, go through the following process using the point chosen, then 
pick a different point, go through the process and see if you obtain the same result for the 
upper and lower limits of the integral. 
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Thus, since the first two coordinates of the boundary points are x' and y ', 
the upper limit will be g(x',y') and the lower limit will be h(x', y'). We thus 
write the integral as 


ra 0 ',»/) 


If f(r,r')dQ(r')= f f dx'dy' I f(r,x',y',z')p Q {x',y',z')dz', 

J/Q, J J J h(x',y') 


where S is the projection of V on the rry-plane. For S to be useful, it must 
have the following property: Every point of the upper and lower boundaries of 
V has one and only one image in S, and no two points of the upper (or lower) 
boundary project onto the same point in S. If this property is not fulfilled, 
then we must choose another coordinate as our first integration variable, or, 
if this does not work, divide the region of integration into pieces for each one 
of which this property holds. 

Let us assume that the property holds for S, and that we can do the 
integral in z' . The result of this integration is a complete elimination of the 
^/-coordinate and the reduction of the triple integral down to a double integral. 
To be more specific, assume that the primitive of the integrand, as a function 
of z\ is F(r,x',y',z'), i.e., that 

= /( r - x ’i y'i z ')pq ( x '. y', z ')- 

Then, the z' integration yields 

[ f{r,x',y',z')p Q {x',y',z')dz' 

= F(r,x',y',g(x',y')) - F(r, x', y', h(x', y')) = G(r,x',y'), 


where the last line defines G. The triple integration has now been reduced to 
a double integral, and we have 


J /(r,r')d<3(r') 


J J dx 1 dy 1 G(r,x',y'). 
s 


We follow the same procedure as above to do the double integral. Once 
again, the region of integration S may have to be divided into a number of 
other more easily integrable regions. However, let us assume that by inte¬ 
grating the x 1 variable first, we will not need to divide the region. The x' 
integration is again a single integral and is done by keeping y' constant. To 
find the upper limit of this integral, we pick an arbitrary point in S, fix its 
second coordinate, and move “to the right” until we hit the boundary of S at 
a point. The first coordinate of this boundary point , when expressed in terms 
of y' , will be the upper limit of the x 1 integration. The lower limit is obtained 
by “moving to the left.” Once again, in most cases, S is bounded by a given 
upper curve of the form x = v(y), and a lower curve of the form x = u(y) (see 
Figure 3.4). Thus, since the second coordinate of both boundary points is y ', 
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b 





/ 


v(y') 


a 


x 


Figure 3.4: The limits of the second integration of a triple integral are defined by two 
curves. 


the upper limit will be v(y') and the lower limit will be u(y'). We thus write 
the integral as 



where / is the projection of S on the y- axis. For / to be useful, it must have 
the same property as S, namely: Every point of the right and left boundaries 
of S has one and only one image in /, and no two points of the right (or left) 
boundary project onto the same point in I. If this property is not fulfilled, 
then we must choose y' as our first integration variable, or, if this does not 
work, divide the region of integration into pieces for each one of which this 
property holds. Assuming that / satisfies this property, and that the primitive 
of the integrand, as a function of x r , is W(r, x', y'), i.e, that 



we get 



Ju(y') 

where the last line defines H. The triple integration has now been reduced to 
a single integral, and we have 



i 


where a and b are the end points of the interval I. 

Sometimes the inverse of the foregoing operation is useful whereby a single 
integral is turned into a multiple integral. This happens when the integrand 
is given in terms of an integral. To be specific, suppose in the integral 
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g(x) is given by 


i = g( x ) dx, 

J a 

r 

g{x) = / h(x, t) dt, 

J U 


where u and v could be functions of x. Then, the original integral can be 
written as 


1 = 


pb r pV \ pb pV 

s h(x, t)dt>dx = / h(x, t) dt dx. 

J a u ) J a J u 


Example 3.3.1. A historical example of this inverse operation is the evaluation 
of the integral 

r°° 2 

1=1 e dx. 

Jo 

As the reader attempting to solve this integral will soon find out, it is impossible to 
find a primitive of the integrand. However, with 


/ 2 = 


r°° 2 r°° 2 

ml e x dx e v dy — 
Jo Jo 


dx dy 


e-(* 2 +!/ 2 ) 


we end up with an integration over the first quadrant of the zy-plane which opens up 

the possibility of using other coordinate systems. In polar coordinates, the integrand 
_ 2 

becomes e r and the Cartesian element of area dx dy becomes the element of area 
in polar coordinates, namely rdrdO. The limits of integration correspond to the 
first quadrant, with the range of 9 being from 0 to 7r/2 and that of r being from 0 
to infinity. This leads to 


J 2 = 


n C 2 r 

Jo Jo 


r dr d9 = 


p'K j 2 pc 

/ de 

Jo Jo 


r dr. 


T/2 __I„ _r 2 | 


This shows that J 2 = rr/4 and, therefore, I = y/n/2. The reader may verify that 


/: 


e x dx = 


(3.26) 


by either invoking the evenness of the integrand or starting from scratch as done 
above. ■ 


3.3.2 Components of Integrals of Vector Functions 

Many calculations involve an integrand which is a vector and whose integra¬ 
tion also leads to a vector. Let us write this as 

F ( r ) = J A(r,r') dQ( r') 


= // [Ai(r,r')ei(r') + A 2 (r, r')e 2 (r') + A 3 (r, r')e 3 (r')] dQ(r'), 


integral of e~ x 
over the positive 
real line 


finding the 
components of the 
vector resulting 
from integration 
of another vector 
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where A±, A 2 , and A 3 are the components of the vector A along the mutually 
perpendicular unit vectors ei, e 2 , and e 3 , respectively. 12 Note that these unit 
vectors are, in general, functions of the variables of integration, and that 


Box 3.3.1. The geometry of the distribution of the source determines the 
most convenient variables of integration (coordinate variables). 


To find the component of F(r) along any unit vector e a , one simply takes 
the dot product of F(r) with e Q . Thus, 

F a (r) = e a • F(r) = e a • ff A(r, r') dQ( r') = If [e a • A(r, r')] dQ( r') 

Jin Jin 

= Ij.Ai(r,r')fi(r') + A 2 (r,r')f2(r') + A 3(r,r')f 3 (r')]dQ(r'), (3.27) 

where /i(r') = e a • ep f 2 ( r') = e a , • e 2 , and / 3 (r') = e a • e 3 . Once these dot 
products are expressed in terms of the variables of integration, the integral 
becomes an ordinary integral which, in principle, can be performed using the 
guidelines above. 


Box 3.3.2. In practice, e a is one of the unit vectors of some convenient 
coordinate system which need not be the same as the coordinate system 
used for integration. 


For example, one may be interested in the Cartesian components of the grav¬ 
itational field of a spherical distribution of mass. In that case, one uses spher¬ 
ical coordinates for integration and the unit vectors inside the integral, and 
e x , e y , or e~ for e a . We shall illustrate this point extensively with numerous 
examples scattered throughout this chapter. 

By the time Newton entered the scene, an immense amount of knowledge of calculus 
had accumulated. In his book Lectiones Geometricae, Barrow, for example, shows 
a method of finding tangents, theorems on the differentiation of products and quo¬ 
tients of functions, change of variables in a definite integral, and even differentiation 
of implicit functions. So, why, one may wonder, is the word “calculus” so much 
attached to Newton and Leibniz? The answer is in these two men’s recognition of 
the generality of the methods of calculus, and, more importantly, their emphasis on 
the newly discovered analytic geometry. 

Isaac Newton was born in the hamlet of Woolsthorpe, England, two months 
after his father’s death. His mother, in need of help for the management of the fam¬ 
ily farm, wanted Isaac to pursue a farming career. However, Isaac’s uncle persuaded 
him to enter Trinity College, Cambridge University. Newton took the entrance exam 

12 These unit vectors are usually those of a convenient coordinate system. 
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and was accepted to the College in 1661 with a deficiency in Euclidean geometry. 
Apparently receiving very little stimulation from his teachers, except possibly Bar- 
row, he studied Descartes’s Geometrie, as well as the works of Copernicus, Kepler, 
Galileo, Wallis, and Barrow, by himself. 

Upon his graduation, Newton had to leave Cambridge due to the widespread 
plague in the London area to spend the next eighteen months, during 1665 and 
1666, in the quiet of his family farm at Woolsthorpe. These eighteen months were 
the most productive of his (as well as any other scientist’s) life. In his own words: 

In the beginning of 1665 I found the ... rule for reducing any dignity 
[power] of binomial to a series. 13 The same year, in May, I found the method 
of tangents ... and in November the direct method of Fluxions [the elements 
of what is now called differential calculus], and the next year in January had 
the theory of Colours, and in May following I had entrance into the inverse 
method of Fluxions [integral calculus], and in the same year I began to think 
of gravity extending to the orb of the Moon ... and ... compared the force 
requisite to keep the Moon in her orb with the force of gravity at the surface 
of the Earth. 

Newton spent the rest of his scientific life developing and refining the ideas 
conceived at his family farm. At the age of 26 he became the second Lucasian 
professor of mathematics at Cambridge replacing Isaac Barrow who stepped aside 
in favor of Newton. At 30 he was elected a Fellow of the Royal Society, the highest 
scientific honor in England. 

Newton often worked until early morning, kept forgetting to eat his meals, and 
when lie appeared, once in a while, in the dining hall of the college, his shoes 
were down at the heels, stockings untied, and his hair scarcely combed. Being 
always absorbed in his thoughts, he was very naive and impractical concerning 
daily routines. It is said that once he made a hole in the door of his house for his 
cat to come in and out. When the cat had kittens, he added some smaller holes in 
the door! 

Newton did not have a pleasant personality, and was often involved in contro¬ 
versy with his colleagues. He quarreled bitterly with Robert Hooke (founder of the 
theory of elasticity and the discoverer of Hooke’s law) concerning his theory of color 
as well as priority in the discovery of the universal law of gravitation. He was also 
involved in another priority squabble with the German mathematician Gottfried Leib¬ 
niz over the development of calculus. With Christian Huygens, the Dutch physicist, 
he got into an argument over the theory of light. Astronomer John Flamsteed, who 
was hardly on speaking terms with Newton, described him as “insidious, ambitious, 
excessively covetous of praise, and impatient of contradictions ... a good man at the 
bottom but, through his nature, suspicious.” 

De Morgan says that “a morbid fear of opposition from others ruled his whole 
life.” Because of this fear of criticism, Newton hesitated to publish his works. 
When in 1672 he did publish his theory of light and his philosophy of science, he 
was criticized by his contemporaries. Newton decided not to publish in the future, 
a decision that had to be abandoned frequently. 

His theory of gravity, although germinated in 1665 under the influence of works 
by Hooke and Huygens, was not published until much later, partly because of his 
fear of criticism. Another reason for this hesitance in publishing this result was his 

13 Newton is talking about the binomial theorem here. 



Isaac Newton 
1642-1727 
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lack of proof that the gravitational attraction of a solid sphere acts as if the sphere’s 
map were concentrated at the center. So, when his friend Edmund Halley urged 
him in 1684 to publish his results, he refused. However, in 1685 he showed that 
a sphere whose density varies only with distance to the center does in fact attract 
particles as though its mass were concentrated at the center, and agreed to write up 
his work. Halley then assisted Newton editorially and paid for the publication. The 
first edition of Philosophiae Naturalis Principia Mathematica appeared in 1687, and 
the Newtonian age began. 


3.4 Problems 

3.1. Use Equation (3.7) to show that /“ f(t) dt = 0. 

3.2. In Equation (3.8), it was assumed that p < q < r. Show that the 
equation holds even if q is not between p and r. 

3.3. For each of the following integrals make the given change of variables: 


( a ) fo tdt, t = y 3 . (b) f ( 


1 dt 

o T+F’ 


t = tan y, 0 < y < 7 r/ 2 . 


r°° t dt 

i+t 3 ’ 


t = 


(c) fo TTt> t = ln v■ (d) JT i +t o, - y- 

3.4. By a suitable change of variables, show the following integral identities: 

( a ) IZ ( aZZ 2 = ^ fo /2 cos * dt ■ ( b ) iT IW = /o dL 

3.5. If 


/»sin(7ra;) 

g(x)= / {cos[ 7 r(f + x)}} e~ fi sin 2 K^/ 2 ) Mtad+t)] dt, 

Jx 2 -1 


find g'( 1). 

3.6. Suppose that F(x) = J^° sx e xt2 dt, G(x) = J^° sx t 2 e xt2 dt, and H(x) = 
G(x) — F'(x). Find H(x) in terms of elementary functions. Show that 
H( tt/4) = e^/s/v^. 

3.7. Suppose that F(x) = J 0 sma: ln(cos 2 x + t 2 + 1) dt, G(x) = f^ mx (cos 2 x + 

t 2 + l)~ 1 dt, and H(x) = F'(x) + 2sin x cos xG (a;). Find H{x) in terms of 
elementary functions. Show that = In 2/2. 

3.8. Evaluate the derivative of the following integrals with respect to x at 
the given values of x: 

(a) J x e~ t2 dt at x = l. (b) f x 3 costdt at x = it. 

(c) f/j s(x/3) e- t2 dt at 


x = 7 r. 


(d) f x cos (y/s) ds at X = 7T. 
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3.9. Find the numerical value of the derivative of the following two integrals 
at x = 1 : 


(a) J o ln - e -x(t 2 - 2 )^ 


(b) l 


x -\-a— 1 


sin 


nxe 

7* 


2 e -( X 2 + a ~i V 


dt. 


3.10. Write the derivatives with respect to x of the following integrals in 
terms of other integrals. Do not try to evaluate the integrals. 

(a) ln(l + sx) ds. (b) J* t 2+ x 2 • (c) f* Vx 2 + a 2 - 2 ax cos t dt. 

3.11. Differentiate ff° dt/(z + t 2 ) = 'k/s/z with respect to 2 to show that 


(a) n 


dt _ 7T 

( 1 +t 2 ) 2 ~ 2 ’ 


(V,'\ f°° dt = 

\ u > J_oo (l + t 2)3 g ■ 


3.12. Using the method of Example 3.2.2, find the following integrals: 


(a) f^ t 2 sin t dt. (b) / Q 6 1 3 sin t dt. (c) f ^ t 4 sin t dt. 

(d) f^ t 2 cos t dt. (e) f^ t 3 cos t dt. (f) f^ t 4 cos t dt. 


In each case calculate the primitive of the integrand and verify your answer 
by differentiating the primitive. 


3.13. Find the integral 


r(n+l) 



t n e 


by first evaluating the integral 

e~ xt dt 

and then differentiating the result n times, and setting x = 1 at the end. Can 
you see why T(n + 1) is called the factorial function? 



3.14. Sketch each of the following integrands to decide whether the approxi¬ 
mation to the integral is good or not. 


(o'! f u i dt ~ n 09 

W J-o.i io+t 2 ~ u - uz - 
(c) f°_ 0 \ cos( 57rx) dx « 0.2. 


(b) /_o.i o.ooi +# 2 ~ 20 °- 


( d ) L o.i cos ff dx ~ °- 2 - 

(e) /_“ e~ wot2 dt m 0 . 2 . (f) ff^ e~ t2 / wo dt w 0 . 2 . 


3.15. Show that if a function is even (odd), then its derivative is odd (even). 


3.16. Use the result of Example 3.3.1 to show that 
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3.17. By differentiating the electrostatic potential 

with respect to x, y, and z, and assuming that is independent of x, y, and 
z, show that the electric field 


can be written as 


E(r) 



k e dq{ r') 

lr - r'|3 ( ' r 


r') 



X 


&y Gy 





Chapter 4 


Integration: Applications 


The preceding chapter introduced integration and dealt with its formal as¬ 
pects. It also gave some general guidelines concerning the calculation and 
manipulation of integrals, in particular how to reduce the process of multiple 
integration to a number of single integrations. In this chapter, we apply the 
formalism of the previous chapter to concrete examples. 


4.1 Single Integrals 

This section is devoted to the simple but important case of single integrals 
with examples from mechanics, electrostatics and gravity, and magnetostatics. 
Generally, we encounter problems which are defined and set up in a single 
dimension leading to integrals that have a single variable to be integrated. 


4.1.1 An Example from Mechanics 


In our discussion of primitive, Equation (3.18) clearly shows that integration 
can be interpreted as the inverse of differentiation. Thus, if we know the 
functional form of the derivative of a quantity, we should be able to express 
the quantity in terms of an integral. 

Velocity is the derivative of displacement. So, we seek to write displace¬ 
ment in terms of an integral of velocity. This is easily done as follows : 1 

dr . . dr , . , . . , 

— = v(f => — = v(s) => dr = v(s) ds. 
dt ds 

Integrating both sides from 0 to t, we get 



r(s) ds 


r(t) - r 0 = / v(s) ds, 


(4.1) 


where 1*0 = r(0), and we used Equation (3.21). 

1 As cautioned below, we change t to s because we anticipate using t as the upper limit 
of the integral. 
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important caution! 


There is an alternative derivation of the last formula which relies directly 
on the definition of integral. Since the velocity of the particle is changing, 
we cannot find the displacement by simple multiplication with time. How¬ 
ever, if we divide the time interval (from 0 to t) into N small subintervals, 
and concentrate on the motion of the particle in each subinterval, then each 
displacement can be approximated by the product of velocity and the small 
time-interval, and the total displacement r(t) — ro will be simply the sum of 
all such displacements. This is summarized as 

N 

r (t) - r 0 « ^ v(si) As* 

i =1 

which, in the limit of larger and larger N, gives 

r(t) - r 0 = / v(s) ds. 

Jo 

Notice how careful we have been to avoid using the same variable for 
integration as well as the limit of integration. This is a practice the reader 
should constantly keep in mind. As a rule 


Box 4.1.1. ( Caution ]). Never use the same symbol for the variable of 
an integral and its limits, or of an integral and of another integral of which 
the first integral is the integrand. 


The following example is a good illustration of the significance of the concept 
of an integral and the rule in the Box above. 


Example 4.1.1. In mechanics, Newton’s second law places special importance 
on acceleration, 2 and a knowledge of acceleration is normally sufficient to solve a 
mechanical problem, i.e., find displacement as a function of time. A particular 
example of this situation is when acceleration is known as a function of time, in 
which case we can immediately find the velocity in exact analogy with Equation 
(4.1). We thus have 

a(s) ds => v(t) = vo + / a(s) ds. 

Jo 


v(t) - v 0 = 


L 


Notice how the argument of v is the same as the upper limit of integration. Now that 
we have velocity, we can substitute it in Equation (4.1) to find the displacement. 
This gives 


r(t) - r 0 



ds 


or 


r (t) 


= r 0 + v 0 £ + 



a(it) du. 


2 Because the second law of motion connects acceleration and the cause of motion, force. 
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Figure 4.1: The region of integration for calculating position as a double integral. 


In the double integral, it is understood that the u-integration is to be done first, 
followed by the s-integration. As the last double integral suggests, the region of 
integration, in the us-plane, is a right triangle bounded by the vertical axis (the s- 
axis, or u = 0), the line u = s, and the horizontal line s = t as shown in Figure 4.1. 
It is convenient, in this case, to change the order of integration. The lower limit of 
the s-integral—the first integration—is u and the upper limit is t. Once this integral 
is done, the ti-integral goes from 0 to t, as can easily be verified. We, therefore, have 


r(t) = ro + vo t + I du a(u) ds = ro + vof + / a (u) du / ds (4-2) 

J 0 J u J 0 J u 


l 


= ro + vo t + / a(u) (t — u) du = ro + vo t + t a (u) du — u a (u) du. 


/ a(u) du — 

Jo Jo 


It is instructive for the reader to show that the first derivative of this expression 
gives the velocity and the second derivative the acceleration. ■ 


Two men are credited with the invention of calculus, Newton and Leibniz. Of course, 
as we have seen, the “invention” of calculus was a long process involving many gen¬ 
erations of mathematicians. Nevertheless, Newton and Leibniz made great contri¬ 
butions to the subject and gave it a prominent role in the subsequent evolution of 
mathematical thought. 

Gottfried Wilhelm Leibniz studied law and, after defending a thesis in logic, 
received a Bachelor of Philosophy degree. He wrote a second thesis on a universal 
method of reasoning in 1666 which completed his work for a doctorate in philoso¬ 
phy at the University of Altdorf and qualified him for a professorship. During the 
years 1670 and 1671, Leibniz wrote his first papers on mechanics and produced his 
calculating machine. 

Leibniz was also involved in the politics of his time. In March, 1672, he went to 
Paris on a political mission as an ambassador of the Elector of Mainz. While in Paris, 
he made contact with notable mathematicians and scientists including Huygens. This 
stirred up his interest in mathematics, a subject that he knew nothing about prior 
to 1672. In 1673 he went to London and met other scientists and mathematicians 
including the secretary of the Royal Society of London. 

While making his living as a diplomat, he delved further into mathematics and 
read Descartes and Pascal. In 1676 Leibniz was appointed librarian and councilor to 
the Elector of Hanover. Twenty-four years later the Elector of Brandenburg invited 


given a definite 
double integral, 
one can 
reconstruct the 
region of 
integration in a 
plane. 



Gottfried Wilhelm 
Leibniz 1646—1716 
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Leibniz to work for him in Berlin. While involved in many political maneuvers, 
including the succession of George Ludwig of Hanover to the English throne, Leibniz 
worked in many fields and his side activities encompassed an enormous range. He 
died in 1716, undeservedly neglected. 

In addition to being a diplomat, Leibniz was a philosopher, lawyer, historian, 
and pioneer geologist. He did important work in logic, mathematics, optics, me¬ 
chanics, hydrostatics, nautical science, and calculating machines. Although law was 
his profession, his contributions to mathematics and philosophy are among the best. 
He tried endlessly to reconcile the Catholic and Protestant faiths. He founded the 
Berlin Academy in 1700. He criticized the universities for being “monkish” and 
charged that they possessed learning but no judgment and were absorbed in trifles. 
Instead he urged that true knowledge—mathematics, physics, chemistry, anatomy, 
botany, zoology, history, and geography be pursued. He favored the German lan¬ 
guage over Latin because Latin was tied to the older, useless thought. Men mask 
their ignorance, he said, by using the Latin language to impress people. 

His numerous mathematical notes on differentiation and integration is full of 
novel ideas. His notations were quite ingenious: He introduced the notation dy/dx 
for the derivative and f for the integral. He recognized the operations of integration 
and differentiation as the inverse of one another. 


4.1.2 Examples from Electrostatics and Gravity 

In electrostatics or magnetostatics, one is sometimes interested in calculating 
the electric or magnetic field of a linear charge or current distribution. In 
electrostatics, one can imagine sprinkling electric charges on a thin piece of 
string and asking for the electric field of the charge distribution. In magne¬ 
tostatics, one flows an electric current through a thin wire and asks for the 
resulting magnetic field. In general, the string or the wire, being a curve in 
space, has a parametric equation given, in Cartesian coordinates say, by 
where /, g , and h are known functions of the parameter t. 
The problems of gravity are entirely analogous to those of electrostatics. The 
master equation of electrostatocs is Equation (3.3) which we reproduce here 
for convenience: 


_ k e dq(r') 

E ~ JL |r-r'| 3(r - r) ’ 



k e dq(r') 

|r — r'| ' 


(4.3) 


Cartesian Coordinates 

Let us assume that Cartesian coordinates are suitable for the problem, and we 
want to calculate the electrostatic field at a point P with coordinates (x, y , z) 
as shown in Figure 4.2. We reduce the integrals in Equation (4.3) to single 
integrals by calculating their various parts entirely in terms of t. First we 
note that the source point P' lies on the curve, and therefore, its coordinates 
{x',y',z') are functions of t. Since we are using Cartesian coordinates, the 
components of the position vector of P' are the same as the source point’s 
coordinates. Therefore, r' = x'e x + y'e y + z'e z = (a/, y', z'). 
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Figure 4.2: Electrostatic field of a general linear charge distribution. 


The element of charge 

dq(v') = A(r') dl( r') = A(r')\/( dx ') 2 + (dy') 2 + (dz 1 ) 2 (4.4) 

turns into a function of f (times fit) after the substitutions: 

x'= f{t), y' = g(t), z' = h(t), 
dx 1 = f'(t)dt, dy' = g'(t)dt, dz' = h'(t)dt. 

Similarly, 


r — r' = xe x + ye y + ze z - x'e x - y'e y - z'e z 
= (x- x') e x + (y - y') e y + (z - z') e z 


and 


l r — r I 
|r — r'l 3 


]/(x~ x'f + (y- y’f + {z - z’) 2 , 
[(a - x’f + (y — y'f + (y- y'f} 


(4.5) 


(4.6) 


Substituting all the above in Equation (4.3) yields an integral in t for E and 
another integral in t for 4>. The limits of these integrals are determined from 
the parametric equation of the curve describing the linear charge distribution. 

As a general rule, in order to find the components of the field along a unit 
vector, we use Box 1.1.2, i.e., we take the dot product of the field with that 
unit vector. This involves taking the dot product of the integrand with the 
unit vector. In the case of Cartesian unit vectors, this procedure simply picks 
out the integral multiplying one of the unit vectors. For other coordinate 
systems, this is not the case, as we shall see shortly. 


Box 4.1.2. Although the geometry of the source (charge distribution) may 
dictate a particular coordinate system, the components of the field can be 
calculated in any coordinate system desired. 
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Thus, by multiplying the integrand by e p , e v , and e z and expressing the dot 
products e p -e x , e v -e x , etc., in terms of Cartesian coordinates, we can obtain 
E p , Ecp, and E z as integrals over t. A similar derivation gives the electric 
potential <f> as an integral over t. Although a formula can be obtained for the 
components of the electric field for a general curve (see Problem 4.3), it is 
best to learn the formalism by an example. 

Example 4.1.2. The simplest example of the general discussion above is a thin 
rod of length L that is uniformly charged with constant linear density A. We want 
to find the electric field and the electrostatic potential at an arbitrary point P in 
space, as shown in Figure 4.3(a). 

As discussed at the beginning of this section, it pays to choose one’s coordinates 
wisely. Clearly, the rod defines an axis naturally. So, let us choose our z-axis to lie 
along the rod. Once this is done, we are free to move the origin up and down, and 
orient the x- and y- axes. Let us use this freedom to put the field (or observation) 
point P on the cc-axis. We then have a situation depicted in Figure 4.3(b). 

To continue, we need the parametric equation of the rod. Clearly, the x' and 
y' parts have the (unique) “parameterization” x' = 0 and y' = 0. There are many 
ways to parameterize the z' part of the curve. However, in situations involving only 
one coordinate, it is most natural to set that coordinate equal to the parameter t. 
So, we choose the following simple parameterization: 

x — 0, y = 0, z =t, a <t < a + L = b. 

Substituting this and r = xe x in Equations (4.5) and (4.6) yields 

r — r' = xe x — te z , 

as well as |r — r'j = yjx 2 + t 2 and |r — r'| 3 = ( x 2 + t 2 ) 3 / 2 . 

Putting all this in Equation (4.3) yields 

E(x, y ,z)=J^ (a , 2 fc ^ 3/ 2 (se, ~ te z ) dt (4.7) 



P 




(a) 



Figure 4.3: Electrostatic field of a uniformly charged rod of length L. (a) The point 
P and the rod, and (b) a convenient Cartesian coordinate system for the calculation of 
the field. The figure assumes a negative A. 
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To find the components of the field in any coordinate system, dot-multiply Equation 
(4.7) by the unit vectors of that coordinate system. For Cartesian components, 
E x = E ■ e z , which picks the term multiplying e* in (4.7); E y = E • e y , which is 
zero; E z = E • e z , which picks the term multiplying e z in (4.7). Thus, 


E x = k e \x 
Ey = 0, 

E z = —k e A 


/: 

i: 


dt 


ke A 


(x 2 + t 2 ) 3 / 2 x V sjx 2 + b 2 \/x 2 + a 2 J ' 


(4.8) 


t dt 


{x 2 + t 2 ) 3 / 2 


— A 


yjx 2 + a 2 y/x 2 + b 2 J 


It is instructive to consider special cases of these formulas, such as when a = —L/2 
and b = +L/2 (especially when L is large compared to *), which may be more 
familiar to the reader. We leave such considerations as exercises. 

The electrostatic potential can be obtained similarly. From Equation (4.3), we 
get 


$(x,y,z) = 


f 

J a 


k e A 


(x 2 + 1 2 ) 1 / 2 

= fee A 111 


dt = k e A ln(t + \Jx 2 + t 2 ) 


b + y/x 2 + b 2 


a + \J x 2 + a 2 


Cylindrical Coordinates 

For cylindrical coordinates the components of the position vector of P' are 
not the same as the coordinates of P'. In fact, r' = p'e p f + z'e z . 

Various parts of the “master” equation (4.3) [or (3.3)] can be calculated 
as before—this time, of course, in cylindrical coordinates—and the results 
substituted in it to arrive at the expression for E entirely in terms of t. Thus 

dq(v') = A(r') dZ(r') = A(r ')\/(dp ') 2 + p , 2 (dip ') 2 + ( dz ') 2 , (4.9) 

where use has been made of Equation (2.29). Similarly, we have 

r - r' = pe p + ze z - p'e p > - z'e z = pe p - p'fy + (z - z')e z (4.10) 

which leads to the absolute value 

l r - r'l = \/(r - r') • (r - r') 

= \J[p&p - P'e p > + {z- z')e z ] ■ [pe p - p'e p , + {z - z')e z }. 

Carrying out the dot product and keeping in mind that e p and e p > are neither 
the same nor perpendicular to each other, but make the two different angles 
p and (p 1 with the a:-axis, we obtain 


caution! 
coordinates and 
components are 
not the same. 


|r — r'| = \jp 2 + p' 2 — 2 pp’ cos(v? — i p') + (z — z’) 2 , 

|r - r'| 3 = {p 2 + p' 2 - 2pp' cos(iy9 - tp') + (z - z') 2 } 3 ^ . 
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Putting everything together, we obtain 


E = 


k e X(r')^/{dp') 2 + p' 2 {dp') 2 + (dz') 2 


•"& {p 2 + p' 2 — 2pp'cos(<p — <p') + {z — z') 2 } 3 ' 2 
x (pe p - p'e p/ +(z- z')e z ). 


(4.11) 


To find components in any coordinate system, use Box 1.1.2 and take the dot 
product of Equation (4.11) with the appropriate unit vectors. The electro¬ 
static potential is derived in a similar way. 

Example 4.1.3. Let us reconsider the example of a rod. Obviously we should 
choose our z-axis along the rod. We further move the origin so that P ends up in the 
a:t/-plane (see Figure 4.4). This will reduce r to pe p . The simplest parameterization 
of the rod is 

p' = 0, z = t, a<t<a + L = b. 

We note that ip' is undefined. This poses no problem because, as will be seen below, 
it will drop out of the equations. Putting these in Equation (4.11) we obtain 



_ y/(0) 2 + (0)(rftp') 2 + {dz') 2 _ 

{p 2 + (0) 2 — 2p(0) cos(<p — y>') + (0 — z ') 2 } 3/2 
\pe p + (0)e p + (0 - z')e z ] 


— keX 


f 

J a 


dt 


(p 2 + t 2 ) 3/2 


(pe p - te z ) 


(4.12) 


To find the components of the electric field, take the dot product of one of the 
unit vectors of a coordinate system and Equation (4.12). For the p component, we 
have 


E p — E ■ g p — k e \ 


f 

J a 


dt 


(p 2 + t 2 ) 


3/2 (d®e te z ) ■ e p 


— ke A 


f 

J a 


dt 


(p 2 + t 2 ) 3 ' 2 


(p e p ■ e p te z ■ e p ) 


(4.13) 


= k e Xp 


f 

J a 


dt 


=i 

ke X 


(p 2 +t 2 ) 3/2 P Iv^Tfe 2 sjp 2 + a 2 j ’ 



Figure 4.4: Electrostatic field of a uniformly charged rod of length L in cylindrical 
coordinates. The figure assumes a negative A. 
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for the ip component, we obtin 


E<p — E ■ e p — &eA 


f 

J a 


dt 


— ke A 


f 

J a 


dt 


( p 2 +t 2) 3/ 2 


(p 2 +t 2)3/2^ 

(pGp * t G z " 6(p) — 0* 


(4.14) 


Note how the dependence on ip has completely disappeared because of the azimuthal 
symmetry of the rod. Finally the z component is 

rb 


E z — E • g z — k e A 


/ 


dt 


:{pe p - te z ) 


— /ueA 


/ 


dt 


( p 2 +t 2) 3/ 2 


( p 2 +t 2) 3/ 2 

(pe p • e z —te z ■ e z 


(4.15) 


— ke A 


L 


t dt 


=0 
—- ke A 


a (p2 +t 2) 3/ 2 1 \fp r + 


=1 

1 


The electrostatic potential <!> can be calculated similarly. 

We can also find the components in Cartesian coordinates by dot-multiplying 
Equation (4.12) with Cartesian unit vectors. For example, 


Ex — E ■ Q x — ke A 


■f 


dt 


;(pe p - te z ) 


— ke A 


f 

J a 


dt 


(p 2 + f 2 ) 3/2 


(p 2 + f 2 ) 3/2 

(P ^ ’ 


= fc e Ap COS If 


f 

J a 


dt 


keXcosep j b 


(p 2 + f 2 ) 3/2 


y/ p 2 + b 2 sj p 2 + a 2 


Ey will be the same except that instead of cos ip it will have sin ip, and E z will 
be identical to the E z of Equation (4.15). When ip = 0, we recover the result of 
Example 4.1.2, because p = x when ip = 0. ® 

All the foregoing derivations in electrostatics can be applied almost ver¬ 
batim to the theory of gravitation. The only difference is the appearance of 
G instead of k e and the interpretation of A as linear mass density. 


4.1.3 Examples from Magnetostatics 


Probably the most realistic physical application of single integrals appears in 
the calculation of magnetic fields of currents in (thin) wires. Before looking 
at examples, let us briefly review magnetism. 

We already mentioned in Chapter 1 that the magnetic field of N (slowly) 
moving point charges is given by 3 




fcmgfcVfc x (r - r k ) 

I r-r fc | 3 


(4.16) 


3 “Slow” compared to the speed of light which is 3 x 10 8 m/s. 
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Figure 4.5: Magnetic field of a moving charge distribution, (a) All charges in motion 
with a “sample" singled out. The vectors show the velocities of some of the charges in 
the sample, (b) The sample is described by a charge A qj and an average velocity Vj. 


In a typical situation, N is of the order of 10 25 or more. So, instead of 
adding all the terms individually, we lump together those that are close to 
one another, i.e., in a small region, and subsequently describe the situation 
by a current density (see Figure 4.5). This boils down to writing Equation 
(4.16) as 

^ fc m A gj -Vj x (r - rj) 

l r — Til 3 

where A qj is the amount of charge in the jth region, Vj is the average velocity 
of all charges in the jth region, and rj is the position vector of the “center” 
of the jth region. We can rewrite the equation above as 

R _ Y' k m [Ag(r 3 -)v(r;)] x (r - r j) 

~U |r-ry| 3 

Biot-Savart law In the limit that M —> oo and A q — > 0, we obtain 


Box 4.1.3. The magnetic field, of a moving charge distribution is given 
by 


B(r) = k„ 


dq( r')v(r') x (r - r') 


r /13 


in l r -r', 

This is the most general form of the Biot—Savart law. 


(4.17) 


The product of the element of charge and velocity appearing in the equa¬ 
tion is related to the various forms of current we may encounter. These are 
described below: 

volume current density: dq( r')v(r') = p(r , )v(r / ) dV( r') = J(r') dV(r'), 


=J(r') 
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surface current density: dq( r , )v(r / ) = cr(r , )v(r / ) da(r') = j(r') da(r'), 

V v ✓ 

=j(r') 

linear current density: dq( r')v(r') = A(r / )v(r / ) dl(r') = I(r') dl(r'). 

=l(r') 

The volume current density J(r') describes a situation in which charges are 
free to move in all directions. The surface current density j(r') is used when 
charges are confined to a surface. The most familiar current density is the 
linear current density which is usually rewritten as 

I(r') dl(r') = Idl(r') = /dr'. 

This follows from the fact that I(r') is in the same direction as the velocity 
(at r') which, since charges are confined to a curve (the wire), has the same 
direction as the (infinitesimal) tangent displacement along the wire, namely 

dr'. 

We are particularly interested in the linear case as shown in Figure 4.6. 

Thus, assuming that the current I is constant—it has to be due to charge Biot-Savart law 
conservation—we obtain for circuits 


Box 4.1.4. The general expression for the magnetic field of a circuit is 
given by 



dr' x 

I r 


(r-rQ 
— r'| 3 



where the circle on the integral sign implies a closed loop. 


(4.18) 


This equation is independent of any coordinate systems. We now specialize 
to Cartesian and cylindrical systems. 



Figure 4.6: A general current filament described parametrically and used to calculate 
the magnetic field in Cartesian coordinates. 
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Cartesian Coordinates 

To obtain the magnetic field we substitute 

r = xe x + ye y + zs z , 
r' = x'e x + y'e v + z'b z , 
dr' = e x dx' + e y dy' + e z dz 


■ ~ r' = (x- x')e x + (y - y’)e y + (z - z')i 


|r — r'l 3 = 


-I 3/2 


]{x-x') +(y-y') +(z-z'Y 
in Equation (4.18). For the cross product, we need to expand the determinant 


dr' x (r — r') = det 


Q x &y @z 

dx' dy' dz' 
x — x' y — y' z — z' 


using Figure 1.5. 


Cylindrical Coordinates 


The cylindrical coordinates can be handled in exact analogy with the Carte¬ 
sian case. Using Equations (1.19) and (2.28), we have 

r = pe p +ze z , r' = p'e p > + z'e zi 
r-r ' = pe p -p'e p >+ (z-z')e z , (4.19) 

dr' = e p ’dp' + B^'p'dp' + B z dz ', 

|r - r'| 3 = { p 2 + p' 2 - 2pp' cos(p - p') + (z - z') 2 } 3 ^. 

The cross product cannot be done using determinants because not everything 
is written in terms of the three mutually perpendicular unit vectors: e p is 
different from e p > but not perpendicular to it. In fact, this difference is the 
cause for the appearance of the cosine term in the last equation of (4.19). To 
find the cross product, we simply multiply the two terms and use the following 
relations, most of which should be familiar, and the unfamiliar ones can be 
obtained using Figure 4.7: 


V 

X 

e p = e z sin (p - ip'), 

B p , 

X B z 

— . 


Q(p r 

X 

e p = -e z cos(<p - p'), 

p' 

X Bp' 

= -e*, 


Q(p* 

X 

B z = Bp ', 

e z 

X Bp 

= 

(4.20) 

e z 

X 

B p ' = e v '. 





product 

can be written as 





: (r — r 

•o 

= e z [p' 2 dp' + psm(p - 

p') dp' 

- PP' 

cos(v? - 

- ¥>') dp'] 



- Bp/ [(z - z') dp' + p 

'dz'] 



(4.21) 



+ B<ppdz' + B p 'p'{z — 

z 1 ) dp'. 
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Figure 4.7: The orientation of some of the cylindrical unit vectors drawn for the 
calculation of cross products. 

To find the components of the magnetic field, we substitute this in 
Equation (4.18), take the dot product of cylindrical unit vectors with the 
integrand, and use 

e p ■ e p > = cos(i p' - p), e p > • e v = sm(p' - p), 

e p ■ = - sin(tp / - ip), e p ■ = cos (ip' - p), (4.22) 

as well as the other more obvious dot products of unit vectors. 

We can derive a general expression for the components of the electric field 
in terms of the parametric functions of a general curve (see Problem 4.6). 
However, a simple example will also illustrate the general procedure without 
entangling the formulas with complicated expressions. 

Example 4.1.4. A simple application of the foregoing general formalism is to 
calculate the magnetic field of a circular loop of radius a. The choice of the axes 
and origin of Figure 4.8 yields the following parameterization of the loop: 

p = a, dp =0; p = t, dp' = dt\ z = 0, dz =0, 0 < t < 2n. 

Furthermore, because of the azimuthal symmetry of the current distribution, the 
final answer will be independent of p. Thus, we can set that equal to zero. Inserting 
this information in Equations (4.19) and (4.21) gives 



Figure 4.8: The geometry of the circular loop of current. 
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|r — r '| 3 = [p 2 + a 2 — 2pacos(t ) + z 2 ] 3 ^ 2 
dr 7 x (r — = e z [a 2 dt — pacos(f) dt\ + e p iaz dt. 

The magnetic field of Equation (4.18) can now be written as 

e z [a 2 — pa cos(f)] + e p taz 


B = km I 


j 


[p 2 + a 2 — 2pa cos(t) + z 2 ] 


3/2 


dt 


(4.23) 


Finally, to find the cylindrical components, dot-multiply (4.23) with the cylin¬ 
drical unit vectors and use Equation (4.22) with <p = 0 (and ip' = t): 


■/ 


e z (a — p cos t) + e p i z 


( p 2 + a 2 — 2 pa cos t + z 2 ) 

=0 =cos t 


3/2 


dt 


— km I CL 


f 


= kmlaz 


f 


e z • e p (a — pcost) + e p / ■ e p z 
(p 2 + a 2 — 2 pa cos t + z 2 ) 
cos t dt 

(p 2 + a 2 — 2 pa cos t + z 2 ) 3 / 2 ’ 


dt 


Similarly, 


(4.24) 


'/ 


e z ■ e v (a — p cos t ) + e p t ■ z 

3/2 


= k m Iaz 


f 

JO 


o (p 2 + a 2 — 2pacost + z 2 ) 


dt 


sin t dt 


(p 2 + o 2 — 2 pa cos t + z 2 ) 3 / 2 


—(p 2 + a 2 — 2apcost + z 2 ) 


2X-1/2 l 2n _ 


= o, 


(4.25) 


and 


B z — B ■ e. — km I a 


/ 


e z • e z (a — pcost) + i 


— km d a 


o (p 2 + a 2 — 2pacost + z 2 ) 3 ^ 2 

f 2vr (a —pcost) dt 

Jo ( P 2 + a 2 — 2pa cos t + z 2 ) 3 / 2 ' 


dt 


(4.26) 


Once again the azimuthal symmetry prohibits a ip-component for the field. These 
integrals cannot be evaluated analytically, but if we specialize to the case where P 
is on the z-axis (i.e., when p = 0), the integrals become trivial. In fact, we have 


B p = k m Iaz 


B v = 0, 


B z — kmla 


j: 

i: 


cos t dt 
(a 2 + z 2 ) 3 / 2 

: —a dt 


= 0, 


2nk m la 2 


(a 2 + z 2 ) 3 / 2 (a 2 + z 2 ) 3 / 2 ’ 
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After graduating from the college of Louis-le-Grand in Paris and subsequently spend¬ 
ing some time in the army, Jean-Baptiste Biot entered the Ecole Polytechnique 
in Paris where Monge (a noted mathematician of the time and an expert in dif¬ 
ferential geometry) realized his potential. Because of his political views and his 
participation in an attempted insurrection by the royalists against the Convention, 
Biot was captured by government forces and taken prisoner. Had it not been for 
Monge’s intervention and plead for his release, Biot’s promising career might have 
ended. 

Biot became Professor of Mathematics at the Ecole Centrale at Beauvais in 
1797, and three years later joined the faculty of the College de France as Professor 
of Mathematical Physics an appointment which was due to the influence of Laplace. 

Biot studied a wide range of mathematical topics, mostly on the applied math¬ 
ematics side. He made advances in astronomy, elasticity, heat, and optics while, in 
pure mathematics, lie also did important work in geometry. He collaborated with 
Arago on the refractive properties of gases. 

Biot’s most notable contribution was done in collaboration with Felix Savart 
(1791 1841), who was an acoustics expert and developed the Savart disk, a device 
which produced a sound wave of known frequency, using a rotating cog wheel as a 
measuring device. 

Biot and Savart jointly discovered that the intensity of the magnetic field set up 
by a current flowing through a wire varies inversely with the distance from the wire. 
This is a special case of what is now known as Biot-Savart’s Law and is fundamental 
to modern electromagnetic theory. 

For his work on the polarization of light passing through chemical solutions Biot 
was awarded the Rumford Medal of the Royal Society. He tried twice for the post 
of Secretary to the Academie des Sciences but lost out in 1822 to Fourier for this 
post. When Fourier died he applied again only to lose to Arago. 


4.2 Applications: Double Integrals 

Whenever areas are sources of physical quantities such as fields, or interactions 
take place on areas, such as pressure applied on a surface, double integrals 
are used. We can be as general as in the previous section and consider a gen¬ 
eral surface given by a parametric equation in two variables (instead of one 
used for curves). However, since the geometry of surfaces is much more com¬ 
plicated, and much less illuminating, we shall confine our discussion to very 
simple geometries which require trivial and obvious parameterization. More 
specifically, we restrict ourselves to primary surfaces of the three coordinate 
systems. 


4.2.1 Cartesian Coordinates 

Since we are restricting ourselves to primary surfaces, our choice for Cartesian 
coordinates is narrowed down to planes, and if we want the boundaries of the 
plane to be simple in Cartesian coordinates, we are limited to just a rectangle. 



Jean-Baptiste Biot 
1774-1862 
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Example 4.2.1. We start with an example from electrostatics. A rectangular flat 
surface of sides a and 6 is charged uniformly with surface charge density a, and we 
are interested in the electric field at a general point P in space. This is given by 


E = 


k e dq{r') 
r — r '| 3 ' 


with r = xe x + ye v + ze z = (x, y, z) and r' = x'e x + y'e y = ( x ', y 1 ,0 ), where we have 
chosen the plane of the rectangle to be the xt/-plane. If we choose the center of the 
rectangle to be our origin, our 2 -axis perpendicular to the plane of the rectangle, 
and our x-and y- axes parallel to the sides as shown in Figure 4.9, then the element 
of area coincides with the third primary element, and we can write 


dq(r') = a{r’) da(r') = a dx' dy '. 


We also have 


electric field of a 
uniformly charged 
rectangle 


r - r' = {x - x')e x + (y - y')e y + ze z = {x-x’,y- y , z), 


|r — r'| = i/(x — x') 2 + (y — y') 2 + z 2 , 

|r - r '| 3 = {(x - x'f + {y- y'f + 2 2 } 3/2 . 


Inserting all these relations in the expression for E, we obtain 



k e o dx' dy' 


x') 2 + {y - y') 2 + z 2 } 3/2 


[(x - x')e x + {y- 


y')e y + ze z ] 


with components 


E x = k e a 


(x — x') dx 1 dy' 


n {(x — x') 2 + (y — y') 2 + z 2 } 3/2 

Ey = k e a If - (y-y')^'dy ' — 

Jin {(x — x') 2 + (y — y') 2 + z 2 } ^ 


E z — keCFZ 


dx' dy' 


n {(x — x') 2 + (y — y 1 ) 2 + z 2 } 3 ^ 2 


where everything independent of the variables of integration, x’ and y', is taken out 
of the integrals. 

We have already discussed a general procedure for evaluating multiple integrals 
by reducing them to lower-dimensional integrals. We follow the same procedure 
here: The y' integration has the lower limit —6/2 and the upper limit +6/2, both 
independent of x '. 4 Similarly, the x' integration has —a/2 and a/2 as its limits. 
This means that the components can be written as 


E x = k e cr 
E y = k e cr 


ra/2 


/ o - 

x)dx 

'-a/2 

J- 

na/2 

j-b/2 

/ dx 

’-b /2 {(x 

’ —a/2 J 


6/2 {(x — x') 2 + (y - y') 2 + 2 2 } 3/2 ’ 

(y - y') dy' _ 

- x') 2 + (y- y') 2 + 2 2 } 3/2 ’ 


4 The independence of the limits is one reason that Cartesian coordinates are useful for 
rectangular regions of integration. If we had chosen cylindrical coordinates, then the limits 
of integration, the lines y' = —b/2 and y' = b/ 2, would have had to be written in cylindrical 
coordinates, giving, for the upper limit, for example, p'sin^' = b/2 or p' = 6/(2 sin ip'). 
Thus a p' integration with limits dependent on ip' would have been involved. 
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Figure 4.9: Electrostatic field of a flat rectangular charge distribution. 


j-a/2 ' nb/2 

E z = k e az / dx / - 

J-a/2 J-b/2 {(X 


_ <k/_ _ 

x') 2 + (y — y 1 ) 2 + z 2 } 3 ^ 2 


Note that the x’ integration cannot be done until after the y' integration, because 
the latter has an ^'-dependent integrand. B 


Having exhausted the (simple) possibilities for electrostatics (and gravity, 
since the two are almost identical), we now turn to magnetostatics. 

Example 4.2.2. Approximate the belt of a Van de Graff machine with an isolated 
moving rectangle having sides a and b, and velocity v along the side b as shown in 
Figure 4.10. Furthermore, assume that the charges are uniformly distributed on the 
belt with surface charge density cr. We want to find the magnetic field of the belt 
at a general point P in space. Let us choose the positive ^-direction to be that of 
the velocity. Then, Equation (4.17) becomes 



adav x 


(r-rQ 

r '[ 3 


The geometry of this example is identical to that of Example 4.2.1. Therefore, we 
can immediately write the integral for B: 


B(r) = kn 


adx'dy've y x [(x — x')e x + (y — y')e y + ze z ] 


/n {(x — x') 2 + (y — y 1 ) 2 + z 2 } 3 ^ 2 

from which the components of the magnetic field are easily calculated: 


/2 fb/2 


B x = kmUVZ 

By = 0 , 

B z = -k m av 


ra/2 r 

/ dx' 

J — a/2 J — 


dy 


/2 J-b /2 {(x — x') 2 + (y — y') 2 + z 2 } 3 ^ 2 


(4.27) 


/ a/2 pb/2 

(x — x') dx' 

-a/2 J-b/2 


dy' 


—b /2 {(x — x') 2 + (y — y') 2 + z 2 } 3 ^ 2 


magnetic field of a 
charged 
rectangular 
moving belt 



Figure 4.10: A rectangular distribution of moving charges whose magnetic field can 
be calculated using Cartesian coordinates. 
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4.2.2 Cylindrical Coordinates 

The cylindrical system has two types of primary surface: planes and cylinders. 
Although we considered planes in the previous subsection, we shall reconsider 
them here because the third primary surface, that perpendicular to the 2 -axis, 
gives us the possibility of solving planar problems with nonrectangular regions 
of integration. Let us start with such a problem. 


Example 4.2.3. In this example we want to calculate the gravitational field of 
a uniform surface mass distribution of density a m which is a segment of a planar 
annular region with inner radius a and outer radius b, and whose sides make an 
angle of a as shown in Figure 4.11(a). Let us choose our origin to coincide with 
the center of the annular region, our *-axis to be along one of the sides, and the 
zy-plane to be the plane of the mass distribution [see Figure 4.11(b)]. 

Recall that in cylindrical coordinates, the components of the position vector of 
P' are not the same as the source point’s coordinates. In fact, we have 

r' = p'e p ,, r = pe p + ze z , 
r - r = pe p + ze z - p e p i , 

|r — r | — yp + p — 2 pp cos ip + 2 ) ' , 

where in the last line we have made the simplification that the field point is in the 
xz- plane, so that <p = 0; otherwise, we would have cos(p — p>') instead of cos p> . The 
element of mass is given by 

dm( r') = (j m da(r) = a m (dp')(p'd(p') = amp dp dp . 


Thus, the gravitational field is 


Gdm( r') . ,. 


= —Gov 


rb no 

. / P dp 

J a JO 


dp'(pe p + ze z - p'e p f) 


(4.28) 


(p 2 + p' 2 — 2pp' cos <p' + a 2 ) 3 / 2 
To find the components, we take the dot product of this integral with the cylindrical 




Figure 4.11: The annular region whose gravitational field is being calculated. The 
position vector of the source point and the lengths of the sides of the element of area 
are also shown. 
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unit vectors. The result will then be 
r b 


g P = —Ga r 


Qip — G(Jj 


g z — Gcjrn 2 


rb no 

. / p'dp 7 

J a JO 

f b ' A 'T 

■■ / P dp 

J a JO 

C p'dp'j 

J a JO 


(p — p' cos p') dp' 


(p 2 + p' 2 — 2pp' COS <p' + 2 2 ) 3 / 2 ’ 
p' sin p' dp' 

(p 2 + p' 2 — 2pp' COS <p' + 2 2 ) 3 / 2 ’ 
dp' 

(p 2 + p' 2 — 2 pp' COS p' + 2 2 ) 3 / 2 


(4.29) 


Let us look at some special cases of this. For a complete annular region, we 
simply replace a with 2n: 


g P = — Gov 


g<p — Gcrt 


g z — G<JmZ 


f b p'dp'r 

J a JO 

rb r 2 tt 

/ Pdp' 

Ja JO 

f p'dp'j 

J a JO 


(p — p' cos p') dp' 


(p 2 + p' 2 — 2 pp' cos p' + 2 2 ) 3 / 2 
p' sin p' dp' 


= 0, 


(p 2 + p' 2 — 2pp' cos ip' + z 2 ) 3 / 2 

2,r _V_ 

(p 2 + p' 2 — 2pp' cos p' + z 2 ) 3 / 2 ' 


(4.30) 


As expected, the ip-component has disappeared. 

We can further simplify the geometry by locating the field point on the 2 -axis. 
Then, p = 0 and we have 


f b 

g P = GcJm I 

J a 

g P = o, 


P' 2 dp' 


(p' 2 + z 2 ) 3 / 2 


■f 


cos <p' dp' = 0, 


g z = -Ga m 2 / 

J a 

— —2n GcfmZ 


p'dp' 


(p' 2 + z 2 ) 3 / 2 
1 


p 27T p b 

■ dip ~ —2 ttG cr-mZ 
Jo Ja 


P'dp' 


(p' 2 +2 2 ) 3 / 2 


1 


. + -* 2 \/& 2 + 2 2 
If we take the limit a —> 0 and b — > oo, we obtain 

2 


g = - 27 rG<T„ 


— Cz — 2nGlJm | | 
2 2 


where we have used Box 4.2.1 (see below). Now note that 2 /| 2 | = ±1 depending 
on the sign of 2 . When 2 > 0, we get 2 /|z|e z = e z which is the unit normal to the 
surface. When 2 < 0, we get 2 /| 2 |e z = —e 2 which is again the unit normal to (the 
other side of) the surface. Denoting the unit normal by e n , we can write 

g — 27rG(T m e n . 

The electrostatic analogue of this is obtained by substituting —k e = — l/47reo 
for G. This yields 

E= 2eo 

which is the field of an infinite sheet of charge with which the reader is familiar. 
Note that while g always points toward the sheet (opposite to e n , because a m is 
always positive), the direction of E is determined by the sign of a q . g 
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4.2.3 Spherical Coordinates 

One of the primary surfaces of a spherical coordinate system is a sphere, and 
since there are a lot of spherical objects around, it is useful to gain experience 
in calculations involving spheres. 

In the following, we shall be taking square roots of functions. Care needs 
to be taken when doing so: 


Box 4.2.1. For any real-valued, quantity A, \JA 2 = \A\, i.e., the square 
root of the square of a quantity is the absolute value of that quantity. 


Failure to keep this in mind will result in incorrect conclusions, as we shall 
see below. 


Example 4.2.4. In this example we are interested in the gravitational field at a 
general point P of a spherical cap, i.e., a segment of a spherical shell of radius a and 
uniform surface density a such that the cone defined by the segment and the center 
of the sphere has a half-angle a (see Figure 4.12). It is clear that the choice of axes 
and origin resulting in the greatest simplification is as shown in Figure 4.12. Notice 
that P is taken to lie in the ajz-plane, so that ip = 0. We can immediately write 


g = ~G 


dm{r' 


7^(r- O 


(4.31) 


with 


y' = ae r /, r = re r , r — r = re r — ae r /, 

e r -e r / 

|r — r / 1 3 = |r 2 + a 2 — 2ra (sin 9 sin 9' cos <p' 4- cos 9 cos , (4.32) 

dm(r') = odai = aa 2 sin 9' d6' dqJ. 

By inserting these relations in (4.31) and dotting the result with unit vectors, we 
obtain the three components of g in various coordinate systems. In spherical coor¬ 
dinates these are 



Figure 4.12: A spherical cap whose gravitational field can be calculated using spherical 
coordinates. 
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g r = —Gaa 
ge = Gaa 
g v = Gaa ‘ 


sin O' {r — a(sin 9 sin 9' cos p' + cos 9 cos #')} d9' dip' 
n {r 2 + a 2 — 2ra(sin 9 sin 9 ' cos p' + cos 9 cos 9 ')} 3 / 2 
3 jj sin 9'(cos 9 sin 9'cos p'— sin 9 cos 9 1 ) d9'dp' 
n {r 2 + a 2 — 2ra(sin 9 sin 9’ cos p' + cos 9 cos 9 ')} 3 / 2 ’ 
sin 2 9' sin p' d9' dp' 


q {r 2 + a 2 — 2ra(sin 9 sin 9' cos p' + cos 9 cos 9 ')} 3,/2 


(4.33) 


= 0. 


The region of integration is one in which 9' varies from 0 to a, and p' from 0 to 
2-7T. The last integral vanishes because of the p' integration. The vanishing of the 
(^-component is simply the result of the azimuthal symmetry. 

The result above is not interesting, but if we move P to the polar axis, so that 
9 = 0, then the equations simplify considerably, and we get 


g r = —Gaa' 


f 


= —2tt Gaa 


l 


sin 9'(r — a cos 9') d9' 

( r 2 + a 2 — 2 ra cos 9') 3 / 2 
sin 9' (r — a cos 9') d9' 


r 2 tt 

/ dp 

Jo 


ge = Gaa 
9v = 0 - 


f 


(r 2 + a 2 — 2 ra cos 6 1 ') 3 / 2 ’ 


sin 2 9'd9‘ 


(r 2 + a 2 — 2 ra cos 


J r 2ir 

^wl 0 co ^'V = o, 


The most interesting result is obtained when a = n, i.e., when we have a com¬ 
plete spherical shell. Then using 



sin 9'(r — a cos O') d9' 
(r 2 + a 2 — 2racos6 1 ') 3 / 2 




which can be looked up in a good integral table, we obtain 


g r — 


2nGaa 2 



ge = 0, g^ = 0. 


i a — d — y- 

For points inside the shell, r < a; therefore --- = -- = 1, and the field 

a — r a — r 

vanishes. Thus, the gravitational field inside a spherical shell is zero. On the other 

\a — t\ v — a 

hand, for points outside, r > a, and --- =-= — 1, leading to 

a — r a — r 


4tt Gaa 2 _ GM 

~ — r 2 ’ 

where M = 4iva 2 a is the total mass of the shell. This is identical to the gravitational 
field of a point charge of mass M located at the center of the shell. Now, if we have 
a number of concentric shells, then, at a point outside the outermost one, the field 
must be that of a point charge at the common center having a mass equal to the total 
mass of all the shells. Note that each shell can have a different uniform density than 
others. In particular, if we have a solid sphere, with a density which is a function of 
r alone, the same result holds. A density which is a function of r alone is called a 
spherical mass distribution. We thus have the famous result: 


gravitational field 
inside a spherical 
shell is zero. 


concept of 
spherical mass 
distribution 
elaborated 
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Box 4.2.2. When gravitationally attracting objects outside it, a spherical mass 
distribution acts as if all its mass were concentrated into a point at its center. 


Newton spent approximately twenty years convincing himself of this result. 

Because of the similarity between gravity and electrostatics, the conclusion above 
can be applied to the electrostatic field as well. Thus, in particular, the electrostatic 
field inside any uniformly charged shell is zero. j 

We take the final example of this section from mechanics and calculate the 
moment of inertia moment of inertia of the foregoing shell about the polar axis. Recall that 
the moment of inertia of a mass distribution about an axis is defined as 


I = I R 2 dm, 


(4.34) 


where R is the distance from the integration point—location of dm —to the 
reference axis. 

Example 4.2.5. The moment of inertia of the spherical shell segment is obtained 
easily. All we need to note is that R = a sin#'. Then Equation (4.34) gives 


I = 


j r not p 27T 

(a sin O') 2 cm 2 sin 9' dd' dip' = a 4 o / sin 3 9'dd 1 / dip' 
n Jo Jo 


2tv ora , 


= 2na 4 a (| cos 3 9' — cos O') | ” = —-— (cos 3 a — 3 cos a + 2). 

We can express this in terms of total mass if we note that the area is given by 

A = If a 2 sin 9' dd' dip' = 2na 2 f sin 9' dd' = 27ro 2 (l — cos a ) 

JJn Jo 


so that 


Therefore, 


M _ M 
A 27ra 2 (l — cos a) 


r „ , 2 cos 3 a — 3 cos a + 2 

I = \Ma -, 

15 1 — cos a 


which reduces to I = §Ma 2 for a complete spherical shell (with a = n). 


4.3 Applications: Triple Integrals 

To illustrate the difficulty of calculations when appropriate coordinate systems 
are not chosen, in the following example we calculate the gravitational field 
of a uniform hemisphere at a point P on its axis in Cartesian coordinates. 
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Figure 4.13: Calculating the gravitational field of a hemisphere in the “unnatural” 
Cartesian coordinates. 


Example 4.3.1. The geometry of the problem is shown in Figure 4.13. The 
location of P and the choice of axes indicate that 


r = ze z , r' = x'e x + y'e y + z'e z , 

I 03 r /2 , /2 . / /n2~i 3/2 

|r — r | ={x + y + (z - z ) } 
dm(r') = pmdV( r') = pmdx dy'dz , 


where p m is the uniform mass density. Thus, 

dx' dy' dz' {x'e x + y'e y + (V — z)e z } 


g = 


Gdm(r') , 

lr - r'|3 ^ r 1 _ Gpn 


{x' 2 + y' 2 + (z — z ') 2 } 3/2 


with components 


gx — 

Qy ‘ 

9z = 



x' dx' dy' dz' 

12 + y 12 + (z- z ') 2 } 3/2 ’ 
y' dx' dy' dz' 

12 + y' 2 + (z- z ') 2 } 3/2 ’ 
[z! — z) dx' dy' dz' 

' 2 + y' 2 + {z- z') 2 } 3/2 ' 


The limits of integrals associated with fl can be done as discussed in Section 3.3. 
In Figure 4.13, we have chosen the first integral to be along the z-axis. Then the 
lower limit will be the *y-plane, or z' = 0, and the upper limit, the surface of 
the hemisphere. A general point P' in f l with coordinates {x',y',z') will hit the 
hemisphere at z’ = a 2 — x' 2 — y' 2 . So, this will be the upper limit of the z' 
integration. Concentrating on the a;-component for a moment, we thus write 


9x — 



dz' 

{x 12 + y' 2 + (z — z') 2 } 3 / 2 


where S is the projection of the hemispherical surface on the xy- plane. To do 
the remaining integrations, we refer to Figure 4.14, where the projections of the 
hemisphere and the point P' are shown. It is clear that the y' integration has 
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Figure 4.14: The projection of fi, a hemisphere, in the xy-p\ane. 


the lower semicircle as the lower limit and the upper semicircle as the upper limit. 
Finally the x' integration has lower and upper limits of —a and +a, respectively. 
We, therefore, have 


/ +a r+V a 2 — 

x dx' / _ dy 

-a J — yj a 2 — X 


7„ 




dz' 


{x 12 +y' 2 + {z- z r ) 2 } 3/2 ' 


Instead of looking up the integrals in an integral table, we note that the integrand 
of the x' integration is an odd function. This is because it is the product of x ', which 
is odd, and another function, in the form of a double integral whose integrand and 
limits are even functions of x'. Since the interval of integration is symmetric, the 
x' integration vanishes. A similar argument shows that the y' integration vanishes 
as well. This is as expected intuitively: We expect the field to be along the 2 -axis. 
Therefore, g x = 0, g y = 0, and 


g z = Gp 
= Gp 
- Gp 


/ -+a f+V a2 — x 

dX J /~2- T 

-a J — \J a* —x 

/ +a /* + -\/ a 2 —x 

dx J r 

-a J — -w a- 


dy 


7„ 


yj a 2 -x ,2 -y 


( 2 ' — z) dz' 


{x' 2 + y’ 2 + (2 — 2') 2 } 3/ ^ 2 


dy' 




\J a 2 — x /2 \/x' 2 + y' 2 -I- z 2 

y/a 2 -x 12 


dy' 


y/a 2 -x ' 2 \J a 2 -1 - 2 2 + x' 2 + y' 2 — 2z\Ja 2 — x' 2 — y' 2 


The y' integration in the first integral can be done, but the remaining x' integration 
will be complicated. The second y' integral cannot even be performed in closed form. 
This difficulty is a result of our poor choice of coordinates whereby the boundary of 
the region of integration does not turn out to be a “natural” surface. ■ 


The example of the hemisphere in Cartesian coordinates indicates the 
difficulty encountered when the boundaries of the integration region do not 
match the primary surfaces of the coordinate system. In the next example, 
we calculate the gravitational field of the hemisphere in spherical coordinates. 

Example 4.3.2. The spherical coordinate system makes the problem so man¬ 
ageable that we can consider a more general mass distribution. We will calculate 
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Figure 4.15: The gravitational field of a solid cone with a spherically curved top. 


the gravitational field of a cone-shaped segment of a solid sphere of half-angle a as 
shown in Figure 4.15. We are interested in the field at a point P on the axis of the 
cone as shown. Since eg and e v cannot be defined at P (why?), we expect, from 
physical intuition, that the only surviving component of the gravitational field is 
radial. This component is obtained by dotting the vector field with e,.: 


g 1 — e r • g — e 
= —Gp 




r sin 9 dr d6 dp , 
_[ 7* — r*oS r) 1 

q (r 2 + r' 2 — 2 rr' cos 9') 3 / 2 
^ ra ra sin e'de 1 


rzi r ra rc 

- Gpm / dip' / r' 2 dr' / 
Jo Jo Jo 


(r 2 + r' 2 — 2 rr' cos 9') 3 / 2 


(r — r' cos 9'). 


To do the integrations, we use the technique of differentiating inside the integral 
and note that 


r — r' cos 9' 


8 


(r 2 + r' 2 — 2 rr' cos 0') 3 / 2 dr ^/r 2 + r' 2 — 2 rr 1 cos 9' 
Therefore, the integral becomes 

d 1 


ra ra. 

g r = 27vGpm / t' 2 dr' / sin 6' 
Jo Jo 

ra rc 

/ r' 2 dr 1 

Jo Jo 


dff 


— ‘ilvG Pn 


dr 2 + r' 2 — 2 rr' cos 9' 
sin 6 y dff 


dr 

= 2nG Pm ^J o 


x/r 2 + r' 2 — 2 rr' cos 9' 


r' 2 dr' ( — \Jr 2 T r' 2 — 2rr' cos 9' 

\ rr' 

2nGpm-7j-^ — j r ' dr' ^x/ r 2 + r' 2 — 2 rr' cos a — y/(r — r') 2 ^ | 


— 2 7vG pn 


d_ 

dr 1 r 


■ J r' dr' 


2 + r' 2 — 2rr' cos a — |r — r 




(4.35) 


The integral involving the absolute value can be done easily. However, we have 
to be careful about the relative size of r, a, and r'. We therefore consider two cases: 
r > a and r < a. Keeping in mind that r' < a, the first case yields 
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r /, /,, / r ,, ,s , / a 3 

I r \r — r \ dr = I r [r — r ) dr =— -—, r > a. 

Jo Jo 23 

For the second case, we have to split the interval of integration in two, and write 
the absolute value accordingly: 

ra rr ra 

/ r'\r — r'\dr'= / r'(r — r')dr' + / r'(r' — r)dr' 

Jo Jo Jr 


r 3 a 3 ra 2 

= y + y-—. r ^ a - 


Substituting these in Equation (4.35), we get 


a 6 

a 2 1 

1 

3 r 

i + ^ 

2 

2 3 

a 

r a 


~3 ~ 3^ 


d_ 

1 dr 

1 r a _ 

H— / r'\/r 2 + r' 2 — 2rr' cos a dr' if r < a. 

r Jo 

The remaining integral can also be performed with the result 

l r a t - ; r 2 

— / r v r 2 + r' 2 — 2rr' cos a dr' = -(1 — 3cos2a) 

r Jo 12 

( a 2 

—f 

3 r 


r a cos a r cos 2a 
12 


6 


r 2 cos a sin 2 a , / a — r cos a + \Jr 2 + a 2 — 2 ra cos a 

H---In 


r — r cos a 


The special case of a = n, i.e., a full sphere, is very important, because his¬ 
torically it motivated the rapid development of integral calculus. For this case, we 
have 


p a 

Jo 


r 2 + r' 2 — 2 rr 1 cos a dr' ?-b(a + r)| --- + 

6 \ 3r 6 6 


a r a 


whereby the radial component of the field becomes 


Qr — 2nGpn 


dr 


U 3 

2 2 
a r 

3 r 

2 6 

2 

2 3 

a 

r a 

_ 2 

6 3r 


+ (a + r) 


a r a 
3r “ 6 + 6 


a r a 
3r “ 6 + 6 


if r < a, 


2 a 3 
3r 2 


if r > a 


— 2'nGpm \ 


2r . 

—jj- u r < a 


GM 


4nGp m r . 


if r < a. 
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The first result is the well-known fact that the field outside a uniform sphere is the 
same as the field of a point charge with the same mass concentrated at the center 
of the original sphere. The second result, usually obtained in electrostatics by using 
Gauss’s law, would not have been obtained if we had not used absolute values when 
extracting a square root. ^ 

Example 4.3.3. A uniformly charged hollow cylinder of length L and volume 
charge density p q has an inner radius a and an outer radius b (see Figure 4.16). 
The cylinder is rotating with constant angular speed ui about its axis. We want 
to find the magnetic held produced by this motion of charges. We note that the 
problem has an azimuthal symmetry, so we do not lose generality if we choose our 
coordinates so that our held point P lies in the xz-pl&ne. This is equivalent to setting 
p = 0 . 

We use cylindrical coordinates in Equation (4.17) to Hnd the magnetic held. For 
a general held point, we have 



so that 


v(r') x (r - r') = upp e^/ x e p -up' 2 e^/ x e p / +up(z - z) e v f 


x e z 



-e 2 



up(z — z')e p i + u{p 2 — pp cos</5 , )®2- 


Substituting all these results in Equation (4.17), we obtain 


B = 


L 


ukm{p q p'dp'dip'dz') [p'(z — z')e p i + (p' 2 — pp' 


z')e p i + (p' 2 - pp' cos<p')e z ] 


{ p 2 + p' 2 — 2pp' cos ip' + (z — z') 2 } 3//2 


,/\2l 3 / 2 


o 



Figure 4.16: The charged rotating hollow cylinder produces a magnetic field due to 
the motion of charges. 
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The cylindrical components are obtained by dotting this equation with the cylindri¬ 
cal unit vectors at P: 


B p = B ■ e p 
B(p —^ B • e r 

B z = B ■ e z 


tokmpq 


LOkmpq 


(dkmpq 


p2tt pb p 

JO J a J - 

ffj 

J 0 J a J- 

rn 

Jo Ja J- 


p' 2 (z — z') cos (p'dp'dip 1 dz' 


3/2 ' 


-i/2 { p 2 + p' 2 — 2 pp’ COS p' + (z — z 1 ) 2 } 
b rL ^ 2 p’ 2 (z — z') siny/ dp’dip'dz 1 

-i /2 {p 2 + p' 2 — 2pp' cosp' + (z — z ') 2 } 3 / 2 
6 rL / 2 (p ,s — pp' 2 cos p') dp'dp'dz' 


= 0, 


-i /2 {p 2 + p' 2 — 2pp' cos ip' + (« — z') 2 } 


3/2 ’ 


The middle equation gives zero as a result of the p' integration. It turns out 
that the z' and p' integrations of the remaining integrals can be performed in 
closed form. However, the results are very complicated and will not be repro¬ 
duced here. Furthermore, the p' integration has no closed form and must be done 
numerically. 

We can also obtain the components of B in other coordinate systems by dotting 
B into the corresponding unit vectors. The reader may check, for example, that in 
Cartesian coordinates, B x = B ■ e^, is the same as B p above and B y is the same 
as B v , i.e., B y = 0. This is due to the particular choice of our coordinate system 

(<p = 0). ■ 


4.4 Problems 

4.1. Differentiate Equation (4.2) to find the velocity and acceleration and 
compare with the expected results. 

4.2. By choosing a coordinate system properly, write down the simplest para¬ 
metric equation for the following curves. In each case specify the range of the 
parameter you use: 

(a) a rectangle of sides a and b , lying in the xy-plane with center at the origin 
and sides parallel to the axes; 

(b) an ellipse with semi-major and semi-minor axes a and b\ 

(c) a helix wrapped around a cylinder with an elliptical cross section of the 
type described in (b); and 

(d) a helix wrapped around a cone. 

4.3. Assume that the parametric equations of a linear charge density are 
x' = f(t),y' = g(t),z' = h(t). By writing everything in Equation (4.3) in 
Cartesian coordinates, show that 

kjmvww±ww+ww 

{[» - fit)} 2 + [y- 9(t)} 2 + [z - h(t )] 2 } 7 
x ([x - f(t)] e x + [y- g(t)] e y + [z - h(t)] e z ) dt. (4.36) 
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and that 

[x - f(t)] dt 


[y - g(t)} dt (4.37) 


[z — h(t)] dt 

and 

kemvww±ww±ww d 

{ [x - f(t )} 2 + [ y - g(t )} 2 + [z- h(t)] 2 } 7 

How is A (t) related to A(r')? 

4.4. (a) Show that 

, . x , , y 

G/i * 67* - -. 6 r , • 671 - - 

x 2 + y 2 sjx 2 + y 2 


®{x,y,z) = f 
J a 


q 

k e A(tW[f'(t)] 2 + [g'(t)] 2 + [h'(t)] 2 

— I 

Jci < 

- /(*)] 2 + [y - 9(t)] 2 + [z- h{t)] 2 } 7 

[ b 

TP — I 

k e A{t)^[f'{t)] 2 + [g'{t)] 2 + \h'{t)} 2 

~ 1 

Ja < 

- f{t)f + [y - g(t)] 2 + [z- h(t)] 2 } 7 

q 

~cr 

k e A{t)^[f'{t)] 2 + [g'{t)] 2 + \h'{t)} 2 

~ l 

Jd < 

+ [y - g(t)f + [z- /i(t)] 2 } 7 


(b) Similarly, express e p ■ e x and e v • e y in Cartesian coordinates. 

(c) Use (a), (b), and Equation (4.36) to find the general expressions for E p and 
E v as integrals in Cartesian coordinates similar to the integrals of Equation 
(4.37). 

4.5. (a) Find the nine dot products of all Cartesian and spherical unit vectors 
and express the results in terms of Cartesian coordinates. 

(b) Use (a) and Equation (4.36) to find general expressions for E r , Eg, and 
E v as integrals in Cartesian coordinates similar to the integrals of Equation 
(4.37). 


4.6. Assume that the parametric equations of a linear charge density are 
p' = = g{t), z' = h(t). By writing everything in Equation (4.3) in 

cylindrical coordinates, show that Equation (4.11) holds and that 


r b k e A(t)y/[f'(t)] 2 + WWWW + lh'(t)} 2 
° { P 2 + [. f(t )] 2 - 2 pf(t) cos(tp - g(t )) + [z- /i(t)] 2 } 7 
x [p~ f{t)cos(g(t) -<£>)] dt 

_ f b WMZW + [/(*)] 2 [g'(*)] 2 + 

0 { P 2 + [f(i)} 2 ~ 2 pf(t) cos(v? - git)) + [z- h(t )] 2 } ' 


x f(t)sm(g(t) - <p) dt 


(4.38) 

(4.39) 
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E, = 


f b kj^WWW + [f(t)Y\g'{t)Y + [h’(tw 

° { P 2 + I/O )] 2 - 2 pf(t) cos(cp - p(t)) + [2 - ft(t)] 2 } 7 
x [2 — /i(t)] (it 


(4.40) 


and 


$ = 


fceA(t) vww+pmw+ww 

2/2 

{p 2 + P(t) - 2pf(t) cos(<p - + [2 - ii(t)] 2 } 


(if 


How is A(t) related to A(r')? 

4.7. Use (4.11) to calculate Cartesian and spherical components of the electric 
field in terms of integrals in cylindrical variables similar to (4.38). 

4.8. Use the cylindrical coordinates for the integration variables of Example 
4.1.3, but calculate the Cartesian components of E. 


4.9. A uniformly charged infinitely thin circular ring of radius a has total 
charge Q. Place the ring in the xy-plane with its center at the origin. Use 
cylindrical coordinates. 

(a) Find the electrostatic potential at P with cylindrical coordinates (p, <p, z ) 
in terms of a single integral. 

(b) Find the analytic form of the potential if P is on the 2 -axis (evaluate the 
integral). 

(c) Find the potential at a point in the a:y-plane a distance 2 a from the origin. 
Give your answer as a number times k e Q/a. 

4.10. Write a general formula for <F(r) of a charged curve in spherical coor¬ 
dinates. 


4.11. A straight-line segment of length 2 L is placed on the 2 -axis with its 
midpoint at the origin. The segment has a linear charge density given by 

u , Q 

A (x,y,z) = ,-j-;—, 

1 2 1 + a 

where Q and a are constants with a > 0. Find the electrostatic potential of 
this charge distribution at a point on the axaxis in Cartesian coordinates. 


4.12. Same as the previous problem, except that 


A (x,y,z) 


aQ 

2 2 + a 2 


Look up the integral in an integral table. 

(a) Does anything peculiar happen at x = ±a? Based on the integration 
result? Based on physical intuition? Look at the result carefully and reconcile 
any conflict. 

(b) What is the potential when L —> 00 ? 
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4.13. A segment of the parabola y = x 2 /a —with a a constant—extending 
from x = 0 to x = L has a linear charge density given by 


\{x,y,z) 


_Ao_ 

\J\ + ( 2x/a ) 2 


where Ao is a constant. Find the potential and the electric field at the point 
(0,0,2). What are <!> and E at (0,0, a/2)? Simplify your results as much as 
possible. 


4.14. A circular ring of radius a is uniformly charged with linear density A. 

(a) Find an expression for each of the three components of the electric field 
at an arbitrary point in space in terms of an integral in an appropriate coor¬ 
dinate system. Evaluate the integrals whenever possible. 

(b) Find the components of the field at the point P shown in Figure 4.17. 
Express your answers as a numerical multiple of k e X/a. 

(c) Find the electrostatic potential at the point P shown in Figure 4.17. Ex¬ 
press your answer as a numerical multiple of k e A. 

For (b) and (c) you will need to evaluate certain integrals numerically. 

4.15. Consider a uniform linear charge distribution in the form of an ellipse 
with linear charge density A. The semi-major and semi-minor axes of the el¬ 
lipse are a and b , respectively. Use Cartesian coordinates and the parametric 
equation of the ellipse. 

(a) Write down the integrals that give the electric field and the electric po¬ 
tential at an arbitrary point P in space. 

(b) Specialize to the case where P lies on the axis that is perpendicular to the 
plane of the ellipse and passes through its center. 

(c) Specialize (a) to the case where P lies on the line containing the minor 
axis. 

4.16. Consider a uniform linear charge distribution in the form of an ellipse 
with linear charge density A located in the xy-plane. The semi-major and 
semi-minor axes of the ellipse are 2 a and a, respectively. 

(a) Write the Cartesian parameterization of the ellipse in terms of trigono¬ 
metric functions. 



Figure 4.17: The figure for Problems 4.14 and 4.21. 
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(b) Write the integral that gives the Cartesian components of the electric field 
at an arbitrary point (x, y, z) in space. 

(c) Specialize to the point (a, 2a, 2a), and write your answer as a numerical 
multiple of k e X/a. 

4 . 17 . Consider a uniform linear charge distribution, with linear charge density 
A, in the form of an elliptical helix whose parametric equation is given by 

x' = acost, y' = bsmt, z'= ct 

Use Cartesian coordinates. 

(a) Write down the integrals that give the electric field and the electric po¬ 
tential at an arbitrary point P in space. 

(b) Verify that when c = 0, you get the field and potential of an ellipse (see 
Problem 4.15). 

(c) Verify that when c = 0 = b, you get the field and potential of a straight 
line segment. 

(d) Verify that when c = 0 = b and a —> oo, you get the field of an infinite 
straight line. 

4 . 18 . Find the three components of the electric field and the potential of Ex¬ 
ample 4.1.2 when a = —L/2 and b = L/2. Approximate the three components 
of the electric field for the case where L » x. 

4 . 19 . Derive all relations in Equations (4.20) and (4.21). 

4 . 20 . Figure 4.18 shows a hyperbola y = \Jx 2 + a 2 . Only the segment be¬ 
tween x = 0 and x = a is charged uniformly with linear density A. 

(a) Write the expression for E as an integral in Cartesian coordinates. 

(b) Find the three components of E as integrals over x'. 

(c) Making the substitution x' = au, write each component as a numerical 
multiple of k e X/a. 

4 . 21 . A circular ring of radius a is uniformly charged with linear density A. 
The ring rotates with angular speed uj about the axis perpendicular to the 
plane of the ring, passing through its center. 



Figure 4.18: The segment of the hypebola that is charged. 
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(a) Find an expression for each of the three components of the magnetic 
field at an arbitrary point in space in terms of an integral in an appropriate 
coordinate system. Evaluate the integrals whenever possible. 

(b) Find the components of the field at the point P shown in Figure 4.17. 
Express your answers as a numerical multiple of k m Xuj. (You will need to 
evaluate some integrals numerically!) 

4.22. An elliptical conducting ring of semi-major axis a and semi-minor axis 
b carries a current I. 

(a) Find an expression for each of the three Cartesian components of the 
magnetic field at an arbitrary point in space in terms of an integral in the 
Cartesian coordinate system. 

(b) Find an integral expression for the components of the held at a point on 
the line perpendicular to the ellipse that passes through its center. 

4.23. Perform the integrals for E x , E y , and E z of Example 4.2.1 when the 
held point is on the z-axis. Hint: You can get E x and E y without doing the 
integrals. 

4.24. Assume that the parametric equations of a current loop are x' = 

= g(t),z' = h(t). By writing everything in Equation (4.18) in Carte¬ 
sian coordinates, show that 


B x { r) = k m I f 

J a 


dt, 


g'{t) [z - h(t)] - h'(t) [y - g{t )] 

{[x - /(*)] 2 + [y - g(t )] 2 + [2 - *■(*)] 2 } ' 

B,( r) = k„J t - 

a {[x- f(t)] 2 + [y~g(t)] 2 + [z-/i(t)] 2 | 

- w f - 

a {[x-f(t)] +[y-g(t)] +[z-h(t)] j 


where a and b are the initial and final values of the parameter t. 


4.25. By writing everything in Equation (4.18) in cylindrical coordinates, 
show that 


Bp — k m I 


B<p — k m I 


B z — km I 


N\ dp' + p'N 2 dip' + p' sin^' — p) dz' 

{p 2 + p’ 2 — 2pp' cos(p — p') + (z — z') 2 Y ' 2 
p'Ni dp' — N 2 dp' + [p — p’ cos(p' — p)] dz' 

{p 2 + p' 2 — 2pp' cos(p — p') + (z — z ') 2 } 3/2 
f p sin(p' — p) dp' + [pp 1 cos(p' — p) — p' 2 ] dp' 
{ p 2 + p' 2 — 2pp' cos(p — p') + (z — z ') 2 } 3/2 


where 


N\ = (z — z') sin(<^ / — p), N 2 = (z — z') cos(<^ / — p) 
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2 a 



Figure 4.19: The figure for Problem 4.28. 


4.26. Derive Equation (4.27). 

4.27. Derive Equation (4.29) from Equation (4.28). 

4.28. A square of side 2a is uniformly charged with surface density a. 

(a) Find the electrostatic potential at an arbitrary point in space. Do one 
of the integrals and express your answer in terms of a single integral in an 
appropriate coordinate system. 

(b) Find the potential at a point a distance a directly above the midpoint of 
one of the sides as shown in Figure 4.19. Express your answer as a numerical 
multiple of k e aa. 


4.29. The area in the xy-plane shown in Figure 4.21 is uniformly charged 
with surface charge density a. The equation of the parabolic boundary is 
y = x 2 /a. Assume that the observation point (field point) P is on the z-axis 
at z = a. 

(a) Derive the Cartesian components of the electric field at P as double inte¬ 
grals. 

(b) Do the y' integration first and then the x' integration to find the compo¬ 
nents of the electric field. Write your answers as a numerical multiples of k e a. 
You will need to evaluate certain integral(s) numerically. 

4.30. Using cylindrical coordinates, find the electrostatic field of a uniformly 
charged circular disk of charge density cr and radius a: 

(a) at an arbitrary point in space; 

(b) at an arbitrary point on the perpendicular axis of the disk; and 



Figure 4.20: The region of the xy-plane that is charged. 
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Figure 4.21: The shaded region is uniformly charged. 


(c) at an arbitrary point in the plane of the disk. 

(d) For (b), consider the case of infinite radius and compare your result with 
the infinite rectangle discussed in introductory physics books and Example 
4.2.3. 

4.31. Figure 4.20 shows a region of the zy-plane that is uniformly charged 
with surface charge density cr. The boundary of the region is given in a 
polar/cylindrical coordinate system by p = acos(2<p) with — 7t/4 < <p < tt/4. 
We are interested in the electrostatic potential at a point P on the z-axis with 
z = a. 

(a) Write the position vector of P and P' (a typical source point) in cylindrical 
coordinates. Now evaluate |r — r'|. 

(b) Write the expression for dq(v') in cylindrical coordinates. 

(c) Write the expression for the potential 4> as a double integral in cylindrical 
coordinates. 

(d) Perform one of the integrations, and wrtie your final answer as a single 
integral. 

(e) Find the value of the potential as a numerical multiple of k e aa. 

4.32. A cylindrical shell of radius a and length L is uniformly charged with 
surface charge density a. Using an appropriate coordinate system and axis 
orientation: 

(a) Find the electric field at an arbitrary point in space. 

(b) Now let the length go to infinity and find a closed-form expression for the 
field in (a). You will have to look up the integral in an integral table. 

(c) Find the expression of the field for a point outside and a point inside the 
cylinder. 

4.33. A uniformly charged disk of radius a and surface charge density cr is 
inthe a:y-plane with its center at the origin and is rotating about its perpen¬ 
dicular axis with angular frequency to. 

(a) Find the cylindrical components of the magnetic field produced at a point 
P = (p, 0, z) as double integrals in cylindrical coordinates. 

(b) Now assume that P is on the ^-axis and find the components of B by 
performing all the integrals involved. 
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4.34. An electrically charged disk of radius a is rotating about its perpen¬ 
dicular axis with angular frequency u>. Its surface charge density is given in 
cylindrical coordinates by a = (ao/a 2 )p 2 , where <to is a constant. 

(a) Find the Cartesian components of the magnetic field produced at an ar¬ 
bitrary point P = (p, 0, z) as double integrals in cylindrical coordinates. 

(b) Now assume that P is on the 2 -axis and find the components of B by 
performing all the integrals involved. 

4.35. Express the components of g of Example 4.2.4 in Cartesian and cylin¬ 
drical coordinates in terms of integrals similar to Equation (4.33). 

4.36. A conic surface of (maximum) radius a and half-angle a is uniformly 
charged with surface density a. 

(a) Find the three components of the electric field at a point on the cone’s axis 
a distance r from its vertex. Express your answers in terms of single integrals 
in an appropriate coordinate system. 

(b) Find the components of the field at r = a/\/3 when a = n/6. By eval¬ 
uating integrals numerically if necessary, express your answer as a numerical 
multiple of k e a. 

4.37. A cone with half-angle a, the distance of whose vertex from its circular 
rim is L, is rotating with angular speed lu about its axis. Electric charge 
is distributed uniformly on the cone with surface charge density <r. Use the 
coordinate system appropriate for this geometry. 

(a) Express the components of the magnetic field produced at an arbitrary 
point in space in terms of double integrals. Evaluate those components whose 
integrals are easily done. 

(b) Move the field point to the axis of the cone, and write the components 
of the field in terms of single integrals. Evaluate the remaining components 
whose integrals are easily done. 

(c) Now assume that a = 7r/3, and express the magnitude of the held on the 
axis at a distance L from the vertex of the cone as a number times k m ioaL. 


4.38. A uniformly charged solid cylinder of length L, radius a, and total 
charge q is rotated about its axis with angular speed u>. Find the magnetic 
held at a point on this axis. 


4.39. Use cylindrical coordinates to calculate the gravitational held of the 
hemisphere of Example 4.3.1 at a point on the 2 -axis. 

(a) Show that 


g z = 27 TGp m j \J z 2 + a 2 - \z\ - 


(a 2 + 2 2 ) 3 / 2 - \a - z\(a 2 + z 2 + az) 
3 2 2 


with the other components being zero. 

(b) Simplify this expression for points outside (2 < 0 and 2 > a), and inside 
(0 < 2 < a). 

(c) Using the result of (b), find the gravitational held of a hemisphere whose 
hat side points up. 

(d) Add the results of (b) and (c) to find the held of a full sphere. 
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Figure 4.22: The segment of a cylinder with uniform charge density used in Problem 
4.41. 


4.40. Find the moment of inertia of a uniform solid cone of mass M and 
half-angle a cut out of a solid sphere of radius a. What is the moment of 
inertia of a whole solid sphere? 

4.41. A solid cylinder of length L has a cross section which is in the shape 
of a segment of an annular ring with outer radius b and inner radius a. It 
is subtended by an angle a and is uniformly charged with total charge q 
(Figure 4.22). Find the electric field at: 

(a) an arbitrary point in space; and 

(b) a point on the axis of the ring. 

(c) What is the answer to (b) if we have a complete ring? 

(d) What is the answer to (a) if we have a complete ring that is infinitely 
long? Consider the three regions: p < a, a < p < b, and p > b. 

4.42. Find the moment of inertia of the (incomplete) cylinder of the previous 
problem about the perpendicular axis passing through the common center of 
the inner and outer radii. Assume that the total mass is M. From this result 
obtain the moment of inertia of a hollow as well as a solid cylinder. 






Chapter 5 

Dirac Delta Function 


Paul Adrian Maurice Dirac, one of the most inventive mathematical physicists 
of all time, co-founder of quantum theory, inventor of relativistic quantum 
mechanics in the form of an equation which bears his name, predictor of the 
existence of anti-matter, clarifier of the concept of spin, and contributor to the 
unraveling of the mathematical difficulties associated with the quantization 
of the general theory of relativity, came across the subject matter of this 
chapter in his study of quantum mechanical scattering. In order to appreciate 
the usefulness of this function, we shall start with an intuitive approach drawn 
from electrostatics. 


5.1 One-Variable Case 

Consider a straight linear charge distribution of length L with uniform charge 
density as shown in Figure 5.1(a). If the total charge of the line segment is g, 
then the linear charge density will be A = q/L. We are interested in the graph 
of the function describing the linear density in the interval (—oo,+oo). As¬ 
suming that the midpoint of the segment is Xq and its length L , we can easily 
draw the graph of the function. This is shown in Figure 5.1(b). The graph 


q!L 




X 0 



- L - 

(b) 


Figure 5.1: (a) The charged line segment and (b) its linear density function. 
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is that of a function that is zero for values less than xo — L/ 2, q/L for val¬ 
ues between xo — L/ 2 and xq + i/2, and zero again for values greater than 
xo + L/2. Let us call this function A(;r). Then, we can write 

{ 0 if x < Xo — L/2, 
q/L if xo — L/2 < x < xq + L/2, 

0 if x > Xq + L/2. 

Now suppose that we squeeze the segment on both sides so that the length 
shrinks to L/2 without changing the position of the midpoint and the amount 
of charge. The new function describing the linear charge density will now be 

{ 0 if x < xo — L/ 4, 

2/L if Xq — L/4 < x < Xq + L/4, 

0 if a:>ro + L/ 4. 

We have “factored out” q for later convenience. We have also introduced a 
second argument to emphasize the dependence of the function on the mid¬ 
point. Instead of one-half, we can shrink the segment to any fraction, still 
keeping both the amount of charge and the midpoint unchanged. Shrinking 
the size to L/n and renaming the function A n (x,Xo) to reflect its dependence 
on n, gives 


{ 0 if x < xo — L/2n, 
n/L if xo — L/2n < x < xo + L/2n, 

0 if x > Xq + L/2n, 

which is depicted in Figure 5.2 for n = 10 as well as some smaller values of n. 
The important property of A n (x,xo) is that its height increases at the same 
time that its width decreases. 

Instead of a charge distribution that abruptly changes from zero to some 
finite value and just as abruptly drops to zero, let us consider a distribution 


X 0 


Xo 


X0 


Xq 


x 0 


Figure 5.2: The linear density function as the length shrinks. 
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that smoothly rises to a maximum value and just as smoothly falls to zero. 
There are many functions describing such a distribution. For example, 

X n (x,x 0 ) = <?y^e -rl(x-Xo)2 

has a peak of qyfnpx at x = xo and drops to smaller and smaller values as 
we get farther and farther away from Xq in either direction. This function is 
plotted for various values of n in Figure 5.3. It is clear from the figure that 
the “width” of the graph of X n {x,xo) gets smaller as n —> oo. 

In both cases X n (x,Xo) is a true linear (charge) density in the sense that 
its integral gives the total charge. This is evident in the first case because 
of the way the function was defined. In the second case, once we integrate 
X n (x,Xo) from —oo to +oo, we also obtain the total charge q. The region of 
integration extends over all real numbers in the second case because at every 
point of the real line we have some nonzero charge. Furthermore, we can 
extend the interval of integration over all the real numbers even for the first 
case, because the function vanishes outside the interval (xo — L/2n, xo + L/2n) 
and no extra contribution to the integral arises. We thus write 


/» + 00 


\ n (x,X 0 )dx 


q 


for all such functions. It is convenient to divide by q and define new functions 
S n (x,x 0 ) by 


5„(a;,Xo) 


X n (x,x 0 ) 

q 



Figure 5.3: The Gaussian bell-shaped curve approaches the Dirac delta function as the 
width of the curve approaches zero. The value of n is 1 for the dashed curve, 4 for the 
heavy curve, and 20 for the light curve. 
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Dirac delta 
function defined 


SO that 

{ 0 if x < Xo — L/2n, 
n/L if xo — L/2n < x < xq + L/2n, 
0 if x > xo + L/2n, 

in the first case, and 

S n (x,x 0 ) = 

in the second case. Both these functions have the property that 

r+oo 


/ -t-oo 

5 n (x,x 0 ) dx = 1, 

-OO 


(5.1) 


(5.2) 


i.e., their integral over all the real numbers is one, and, in particular, inde¬ 
pendent of n. 


Box 5.1.1. The Dirac delta function 5(x,Xq) is defined as 
S(x,x 0 ) = lim <5„(x,x 0 ) 

n —too 

and has the following property: 



(5.3) 


(5.4) 


Equation (5.4) follows from the fact that the integral in (5.2) is independent 
of n. The Dirac delta function has infinite height and zero width at xq, but 
these two undefined quantities compensate for one another to give a finite area 
under the “graph” of the function. The Dirac delta function is not a well- 
behaved mathematical function as defined in elementary textbooks because at 
the only point that it is nonzero, it is infinite! Nevertheless, this function has 
been investigated rigorously in higher mathematics. For us, the Dirac delta 
function is a convenient way of describing densities. 

Although we have separated the arguments of the Dirac delta function 
by a comma, the function depends only on the difference between the two 
arguments. This becomes clear if we think of the Dirac delta function as the 
limit of the exponential because the latter is a function of x —Xo- We therefore 
have the important relation 


S(x, xo) = 5{x — xo). (5-5) 

In particular, since the delta function becomes infinite at x = Xq, we have 


5(x,x 0 ) 


= S(x — Xo) = A (0) = oo. (5-6) 

X=Xq X=Xq 
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One can think of the last equality as an identity satisfied by the Dirac delta 
function: 


Box 5.1.2. The Dirac delta function is zero everywhere except at the point 
which makes its argument zero, in which case the Dirac delta function is 
infinite. 


Since the Dirac delta function is zero almost everywhere, we can shrink 
the region of integration to a smaller interval. In fact, 



xq) dx = 1 


as long as Xq lies in the interval (a, 6). If Xq is outside the interval, then the 
integral will be zero because the delta function would always be zero in the 
region of integration. We summarize these results: 


Box 5.1.3. The Dirac delta function satisfies the following relation 

rb 


1 5(x — xq) dx 


! if a < xo < b, 
0 otherwise. 


(5.7) 


Equation (5.f) is a special case of this, because —oo < Xq < +oo for any 
value of Xq ■ 


5.1.1 Linear Densities of Points 


Any function A(x) whose integral over all real numbers is one is called a linear 
density function. The <5„’s defined above are such functions. If we multiply 
a linear density function by a physical quantity Q , the result will be a linear 
density for Q. In fact, this was how we arrived at S n . Thus, Qq\(x) is a 
Q-linear density with total magnitude Qo- Similarly, if M represents a mass, 
then M\[x) is a linear mass density with total mass M. Conversely, if f(x) 
describes the linear density of a physical quantity with total magnitude Q , 
then A(x) = f(x)/Q is a linear density function. 

Because of Equation (5.4) the Dirac delta function is a linear density func¬ 
tion. What kind of a distribution does it describe? To be specific, consider 
mS(x,x o) with to designating mass. This function is zero everywhere except 
at Xo- Thus, if it is to be a mass distribution, it has to be a point mass located 
at Xq. Keep in mind that mS(x, xq) is a linear mass density , so that its integral 
is the total mass m. The linear “density” of a point mass is infinite because 


linear density 
function 


<5 function and 
densities of point 
charges and point 
masses 
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its length is zero, and this is precisely what m6(x,x o) describes. In fact, the 
linear density of a point physical quantity of magnitude Q located at xq can 
be written as QS(x,xq) = QS(x — Xq), or generalizing, 


Box 5.1.4. The linear density X(x) of N point physical quantities 
Qi, Q 2 , ■ ■ ■, Qn located at x±, X 2 , _, xjv, respectively, can he written as 

N 

\(x) = ^2 Qk^{x - Xk)- (5.8) 

k =1 


We see that with the help of the Dirac delta function we can express discrete 
charge distributions (collection of point charges) in terms of functions. This 
is the most useful property of the Dirac delta function. 

Example 5.1.1. Three charges —q, 2 q, and — q are located along the a:-axis at 
—a, the origin, and +a, respectively. How do we write the linear charge density for 
such a charge distribution? We use Equation (5.8) with Q replaced by q: 

3 

A(*) = ^2 QkS(x — Xk) = —q5(x — (—a)) + 2 qS(x — 0) — q6(x - a ) 

fc=i 

= —qS(x + a) + 2q5(x) — q5(x — a). 


Note that the Dirac delta functions ensure that no electric charge is present any 
where except at x = a, x = —a, and x = 0. _ 


density of 
one-dimensional 
ionic crystal 


Example 5.1.2. A more interesting example of a linear charge distribution using 
the Dirac delta function is that of an infinite array of point charges equally spaced 
on a straight line having equal magnitudes and alternating in sign. This is a one¬ 
dimensional model of ionic crystals. 

Let us assume that the magnitude of each charge is ±g, the spacing between it 
and the neighboring charge is a, and that the charges start at —00 and extend to 
+00 with one positive charge at the origin as shown in Figure 5.4. Then it is easy 
to write the density of this distribution. It is 


+ OO 

X(x) = ^2 (—1 ) k q5(x — ka). 

k= — 00 


o 


a —1 


Figure 5.4: A one-dimensional ionic crystal. The black circles represent positive charges 
and the white circles the negative charges. 
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Note that for odd k the charge is negative and for even k it is positive. This is 
because we placed a positive charge at the origin. Had we chosen the origin to 
be the site of a negative charge, the above arrangement would have shifted by one 
spacing. 


5.1.2 Properties of the Delta Function 


From a mathematical point of view, the most important property, which is 
sometimes used to define the Dirac delta function, occurs when it multiplies a 
“smooth” 1 function in an integrand. First look at an integral with a S n (x—x o) 
inside. If the function f(x) multiplying 8 n {x — Xo) is smooth and n is large 
enough, the product f(x)6 n (x — a.’o) practically vanishes outside a narrow 
interval in which 5 n (x — a’o) is appreciably different from zero. For example, 
if n = 10 7 , x = Xo + 0.001, and we use the exponential function of Equation 
(5.1), then S n (x — xq) = 0.08, so that f(x)S n (x — Xo) drops to about 8% of 
the value it has at xo , assuming that f does not change appreciably in the 
small interval of width 0.002 around xq. For larger values of n this drop is 
even sharper. In fact, no matter what function we choose, there is always a 
large enough n such that the product f(x)6 n (x — Xo) will drop to as small 
a value as we please in as short an interval as we please. Therefore, we can 
approximate the integral over all real numbers to an integral over that small 


r+oo 


, xq + e). Then, we have 

nx o+e 

f(x)6 n (x - x 0 ) dx 

J xo — e 

rx 0 +e 

f(x o) 

/ S n (x-x 0 )dx 

J 

Xq € 


/*+oo 

f(x o) 

J 

/ S n {x - x 0 ) = f 

— OO 


The approximation in the second line follows from the fact that f(x) is almost 
constant in the small interval (xq — c,xq + e). The third approximation is a 
result of the smallness of 5 n outside the interval, and the equality follows 
because S n is a linear density function. The approximation above reaches 
equality once the limit of n —> oo is taken in which case S n becomes the Dirac 
delta function. Thus, we have the important relation 



f(x)5(x 


x 0 ) dx = f(x o). 


(5.9) 


1 In the present context, a smooth function is one that does not change abruptly when 
its argument changes by a small amount. 
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integral of product 

of 8{x — xo ) and 
f(x) is simply 
/(* o) 


distributions 


This is equivalent to the following statement: 


Box 5.1.5. The Dirac delta function satisfies 



f(x o) if a < xo < b, 

0 otherwise. 


(5.10) 


In words, the result of integration is the value of f at the root of the 
argument of the delta function, provided this root is inside the range 
of integration. 


Example 5.1.3. In this example we illustrate some of the properties of the 
Dirac delta function. For instance ff° f(t)S(t) dt = 0 because the root of the 
argument of the Dirac delta function (the point that makes the argument of the 
Dirac delta function zero)—namely t = 0—is outside the range of integration. The 
integral ff °° x5(x) dx is zero because the function x vanishes at the point x = 0 (the 
root of the argument of the delta function). Also, 

f+3 

/ cos y8(y — 7 r) dy = 0 


because n —which makes the argument of the delta function vanish -lies outside the 
range of integration. However, 


r+3.2 

/ ‘ 

•J — OO 


cosy8(y — n) dy = cosn = —1 


because now 7r lies inside the range of integration. 
The reader is urged to check the following results: 


/ 

{ 

J e5 (t) 

r 


cos y8(y — n) dy = —1, 


cos yS(y + tv) dy = 0, 


dt = 1. 


/ 

/ 


sin zS(z) dz = 0, 


— OO 

+oo 


— OO 

+oo 


cos - 7r) dy = 0, 


lnt 5{t — e) dt = 0, 


/ +<x> 

xf(x)5(x) dx = 0, 

-OO 

/ 2.8 

In t 5(t — e) dt = 1. 

-OO 


As noted earlier, the Dirac delta function is not an ordinary over-the- 
counter function. Nevertheless, it is possible to study it, along with many 
other “weird” functions called distributions, in a mathematically rigorous 
and systematic way. It turns out that, in all physical applications, distribu¬ 
tions occur inside an integral , and once they do, Equation (5.10) tells us how 
to manipulate such integrals. The result of integration is always well defined 
because it is simply the value of a “good” function at a point, say Xq. In fact, 
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the result of integration is so nice that one can even define the derivative of 
the Dirac delta function by differentiating (5.10) with respect to Xq . We leave 
the details as an exercise and simply quote the result: 



f(x)6'(x 


xq) dx 


~f'{x 0 ) 


(5.11) 


Higher order derivatives of the Dirac delta function can be obtained similarly. 
In fact, we have 


derivatives of 
Dirac delta 
function 


Box 5.1.6. The nth derivative of the Dirac delta function satisfies 

f (-l) n f (n) {x 0 ) ifa<x 0 <b , 


f f(x)6^ n \x—xo)(x—xo)dx = 
J a 


0 otherwise, 

where the superscript ( n ) indicates the nth derivatives. 


(5.12) 


In many applications the argument of the Dirac delta function is not of 
the simple form (x — xq), but may itself be a function g[x) whose deriva¬ 
tive is assumed to be continuous in ( a,b ). Since by Equation (5.6) the delta 
function vanishes except when its argument is zero, in such a case, one has 
to concentrate on the roots of g(x), i.e., values c for which g(c) = 0. For 
simplicity, first assume that there is only one root c of g in the interval (a, b) 
and that g'{c ) > 0. Then, since the Dirac delta function is zero everywhere 
in the interval (a, b), except at x = c, we can shrink the region of integration 
to (c — e, c + e), and write 


what happens 
when the 
argument of <5 is 
itself a function? 


5 ( g(x )) dx = 


rc+e 


S ( g{x )) dx. 


Now make the change of variable y = g(x), dy = g'(x) dx with the appropriate 
transformation of limits of integration to get 

f' 9(c+e) dv 

5{g{x))dx= 6{y)-—. 

Jg(c-e) 9(x) 

With g(c ) = 0 and g'{c ) > 0, we conclude that g is increasing in the interval 
(c — e, c + e), that g(c — e) < 0, and that g(c + e) > 0. We can therefore write 



6 (g(x)) dx 



1 

9' (*) „=o 


1 

5'(c) 


> 0 , 


because zero is in the region of integration and y = 0 is equivalent to x = c 
there. 





148 


Dirac Delta Function 



When the delta function is multiplied by a smooth function f(x), a similar 
argument as above—which is left to the reader—can be used to show that 


r ‘ /(*)*«*)) d X = {p-> /(c * )/l9 ' (c * )l if “< « < 

r, 0 otherwise, 






5.1 One-Variable Case 


149 


provided g'(ck) ^ 0. These results are sometimes written as an identity among 
the delta functions. 


Box 5.1.8. The Dirac delta function satisfies the following relation: 

%0*0) = > 9'(ck) ± 0, (5.15) 

where {cfc}^ =1 are all the roots of the equation g(x ) = 0. 


The formula analogous to Equation (5.14) involving the derivative of the Dirac 
delta function is 


f(x)5’(g(x))dx = 


- ELi f'( c k)/\g'(c k )\ if a < c fc < b, 
0 otherwise. 


(5.16) 


Example 5.1.4. As a concrete example, let us evaluate the integral 

/ + oo 

f{t)5(t 2 - a 2 ) dt, 

-oo 

where / is a smooth function and a is a real constant. We can identify g{t) as t 2 — a 2 
with roots ci = —a, ci = a and derivative g'{t) = 2 1. Therefore, Equation (5.15) 
reduces to 

s( ,2 _ 2 \ = S(t- Cl) S{t-c 2 ) = 5(t-(-a)) 5(t-a) 

{ ’ l5'(ci)| \g'(c 2 )\ | - 2a| + \2a\ 

= |^|{5(t + a) + <5(t-a)}. 

Substituting in the integral, we obtain 

1 r+oo 

/= 2| a\ J f( t ){ 5 ( t + a ) + 5 ( t ~ a )} 

-i r /»+oo r+oo \ 

= 5 m { L fmt+a)+ L /<f),5 “" a) } 

= 2R {/( - a) + /(a)} - 

Note that the integral vanishes—as expected- if / is odd. _ 


Example 5.1.5. We illustrate further the foregoing general discussions with some 
more concrete examples. To evaluate the integral 


i: 


sint 8{t 2 — 7t 2 /4) dt, 


a very important 
relation 
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we note that g(t ) = t 2 — 7 r 2 /4 which has two roots ci = tv/2 and C 2 = — n/2 with 
only the positive root lying in the range of integration. Moreover, g'(t) = 2 1. Thus, 


[ sint 8(t 2 — 7t 2 /4) dt = -J-j 
J 1 1 9 1 


/(ci) _ sin(ci) _ sin(7r/2) _ 1 


On the other hand, 


/: 


(ci) I 12 ci| 


sint 8(t 2 — 7 r 2 / 4 ) dt = 0 


because the second root C 2 is also included in the range of integration and its con¬ 
tribution cancels that of ci. 

To evaluate the integral 


f 

Jo 


inz S(z 2 — 4) dz 


we note that g(z) = z 2 — 4 which has two roots ci = 2 and C 2 = —2 with only the 
positive root lying in the range of integration. Thus, with g'(z) = 2 z, we have 


f 


1 x/ 2 iu /(ci) ln(ci) In2 nl7QQ 

111« d(2 — 4) ds = —-—rr = ——|- = —;— = 0.1733. 

\g'( Cl )\ 12ci | 4 


The integral 


/ + OO 

-oo 


f(y)S(y 2 +a 2 )dy 


is zero because there is no point in the range of integration at which the argument 
of the Dirac delta function vanishes. In other words, g(y) = y 2 + a 2 has no real 
roots at all. 

To evaluate the integral 

T / 2 


/ 


(t + 1 ) <5(sin 7 it) dt 


-tt/2 

we note that g(t) = sin 7 rt which has three roots c\ = — 1 , C 2 = 0 , and C 3 = +1 in 
the range of integration. Thus, with g'(t) = 7 rcos 7 r t, we have 




(Ck + 1 )“ 


( Cfc )l |7rcos(c fc 7r)| 

(-1 + 1) 2 [ (0 + 1) 2 (l + l ) 2 


7T COs( — 7r) I 17T COs(0) | |7 TCOs(7t)| 7T 

Some other concrete examples are: 

r +00 


cos* S(x 2 — n 2 ) dx = — 1/-7T, 


— OO 

+3 


cos y S(y 2 + n 2 ) dy = 0 , 


/ +00 /• 

sin |t| 8{t 2 — tv 2 /4) dt = 2/n, / 

-OO J — 

POO P 

J Inz 5(z 2 — 1) dz = 0, J 

/ + 7T P 

(t + l) 2 5(sin nt) dt = 35 /tv, / 

poo p-\-oo 

/ In* J(10* 2 + 3* — 1) dx = — 0.23, / f(t)5(e t )dt = 0. 

J 0 J — oo 


— oo 
+00 


f(t)5(e -l)dt = /(0), 


The reader is urged to derive all the above relations. 
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“Physical laws should have mathematical beauty.” This statement was Dirac’s re¬ 
sponse to the question of his philosophy of physics, posed to him in Moscow in 1955. 
He wrote it on a blackboard that is still preserved today. 

Paul Adrien Maurice Dirac (1902-1984), was born in 1902 in Bristol, Eng¬ 
land, of a Swiss, French-speaking father and an English mother. His father, a 
taciturn man who refused to receive friends at home, enforced young Paul’s silence 
by requiring that only French be spoken at the dinner table. Perhaps this explains 
Dirac’s later disinclination toward collaboration and his general tendency to be a 
loner in most aspects of his life. The fundamental nature of his work made the 
involvement of students difficult, so perhaps Dirac’s personality was well-suited to 
his extraordinary accomplishments. 

Dirac went to Merchant Venturer’s School, the public school where his father 
taught French, and while there displayed great mathematical abilities. Upon grad¬ 
uation, he followed in his older brother’s footsteps and went to Bristol University to 
study electrical engineering. He was 19 when he graduated from Bristol University 
in 1921. Unable to find a suitable engineering position due to the economic reces¬ 
sion that gripped post-World War I England, Dirac accepted a fellowship to study 
mathematics at Bristol University. This fellowship, together with a grant from the 
Department of Scientific and Industrial Research, made it possible for Dirac to go 
to Cambridge as a research student in 1923. At Cambridge Dirac was exposed to 
the experimental activities of the Cavendish Laboratory, and he became a member 
of the intellectual circle over which Rutherford and Fowler presided. He took his 
PhD in 1926 and was elected in 1927 as a fellow. His appointment as university 
lecturer came in 1929. He assumed the Lucasian professorship following Joseph 
Larmor in 1932 and retired from it in 1969. Two years later he accepted a position 
at Florida State University where he lived out his remaining years. The FSU library 
now carries his name. 

In the late 1920s the relentless march of ideas and discoveries had carried physics 
to a generally accepted relativistic theory of the electron. Dirac, however, was dis¬ 
satisfied with the prevailing ideas and, somewhat in isolation, sought for a better 
formulation. By 1928 he succeeded in finding an equation, the Dirac equation, that 
accorded with his own ideas and also fitted most of the established principles of the 
time. Ultimately, this equation, and the physical theory behind it, proved to be 
one of the great intellectual achievements of the period. It was particularly remark¬ 
able for the internal beauty of its mathematical structure, which not only clarified 
previously mysterious phenomena such as spin and the Fermi—Dirac statistics 
associated with it, but also predicted the existence of an electron-likc particle of 
negative energy, the antielectron, or positron, and, more recently, it has come to 
play a role of great importance in modern mathematics, particularly in the inter¬ 
relations between topology, geometry, and analysis. Heisenberg characterized the 
discovery of antimatter by Dirac as “the most decisive discovery in connection with 
the properties or the nature of elementary particles .... This discovery of particles 
and antiparticles by Dirac ... changed our whole outlook on atomic physics com¬ 
pletely.” One of the interesting implications of his work that predicted the positron 
was the prediction of a magnetic monopole. Dirac won the Nobel Prize in 1933 for 
this work. 

Dirac is not only one of the chief authors of quantum mechanics, but he is 
also the creator of quantum electrodynamics and one of the principal architects of 


“The amount of 
theoretical ground 
one has to cover 
before being able 
to solve problems 
of real practical 
value is rather 
large, but this 
circumstance is an 
inevitable 
consequence of 
the fundamental 
part played by 
transformation 
theory and is likely 
to become more 
pronounced in the 
theoretical physics 
of the future." 
P.A.M. Dirac 
(1930) 



Paul Adrien 
Maurice Dirac 
1902-1984 
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quantum field theory. While studying the scattering theory of quantum particles, he 
invented the (Dirac) delta function ; in his attempt at quantizing the general theory 
of relativity, he founded constrained Hamiltonian dynamics, which is one of the most 
active areas of theoretical physics research today. One of his greatest contributions 
is the invention of the bra (| and ket \) notation used in quantum theory. 

While at Cambridge, Dirac did not accept many research students. Those who 
worked with him generally thought that he was a good supervisor, but one who 
did not spend much time with his students. A student needed to be extremely 
independent to work under Dirac. One such student was Dennis Sciama, who later 
became the supervisor of Stephen Hawking, the current holder of the Lucasian chair. 

Salam and Wigner in their Preface to the Festschrift that honors Dirac on his 
seventieth birthday and commemorates his contributions to quantum mechanics 
succinctly assessed the man: 

Dirac is one of the chief creators of quantum mechanics.... Posterity 
will rate Dirac as one of the greatest physicists of all time. The present 
generation values him as one of its greatest teachers.... On those privi¬ 
leged to know him, Dirac has left his mark ... by his human greatness. 

He is modest, affectionate, and sets the highest possible standards of 
personal and scientific integrity. He is a legend in his own lifetime and 
rightly so. 

(Taken from Schweber, S. S. “Some chapters for a history of quantum field theory: 
1938-1952,” in Relativity, Groups, and Topology II, vol. 2, B. S. DeWitt and R. 
Stora, eds., North-Holland, Amsterdam, 1984.) 


5.1.3 The Step Function 

The step function 9 is defined as 


6{x) 


1 if x > 0 
0 if x < 0 


(5.17) 


The 6 function (as it is often called) is useful in writing functions that have 
discontinuities or cusps. For instance, absolute values can be written in terms 
of the step function: 


|x| = xO(x) — x9(—x) or \x — y\ = {x — y)[9(x — y) — 9(y — a:)] 


A piecewise continuous function such as 


g{x) 


gi(x) if 0 < x < 1 
g 2 {x) if x > 1 


can be written as 


(5.18) 


g{x) = gi{x)9{x)6{l -x) + g 2 {x)9{x - 1) 
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Because 6 is constant everywhere except at 0, its derivative is zero ev¬ 
erywhere except at 0. The discontinuity at 0 makes the derivative infinite 
there: 


0'(O) = lim 

e —>0 


0(0 - fl(-0 

2e 


lim 

€—*0 


1-0 


2e 


oo 


This strongly suggests the identification of the derivative of the step function 
as the Dirac delta function. In fact, noting that 


9(x - x Q ) 


1 if x > xo 
0 if x < xq, 


(5.19) 


and the fact that 9'(x — xo) is zero everywhere except at Xq , for any well- 
behaved function fix) we obtain 

/ OO rx o+e rxo+n 

f(x)9'(x — xo)dx= / f(x)9 / (x — xo)dx£zf(xo) / 9'(x — xo)dx 

-OO JXq — € J Xq — € 

= fix o) 9(x - x 0 )\*° o t e e = /(x o )[0(e) - 0(-e)J = f{x 0 ) 

=i =o 

We thus have another important representation of the Dirac delta function: 


5{x — xq) = 9'(x — xq) 


(5.20) 


Example 5.1.6. For positive a, tanh(ax) goes to 1 as x —> oo and to —1 as 
x —> —oo and it makes a smooth transition from one of these asymptotic values to 
the other. This transition gets steeper and steeper for larger and larger values of a. 
This suggests the following relation: 

9(x — xo) = | lim {1 + tanh[a(x — xo)]} 

^ a —>oo 

Let 9 a (x — xo) stand for the function on the right-hand side for any finite positive 
a. Then 


9' a (x - x 0 ) 


- 7 - {1 + tanh[a(x — xo)]} 
z ax 


asech 2 [a(x — xo)] 
2 


and 


/: 


9' a {x - Xo) dx = 9 a (x - xo)|!° c 


i {l + tanh[a(x-x 0 )]}|! o oo = 1 


for any value of a > 0 , in particular for a - 
tation of the Dirac delta function: 

S(x — xo) = lim 9' a {x — xo) 


00 . Thus, we get yet another represen- 


.. asech 2 fa(x 
lim - 


x 0 )] 


step function and 
its relation to 
delta function 


2 
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Dirac Delta Function 


We can generalize the discussion of the previous section to the case of many 
variables. For example, in two dimensions using Cartesian coordinates, we 
can define the functions S n as 


S n (x - x 0 ,y - y 0 ) = Ce~ n l (x - Xo ^ +(y - yo)2 ] = Ce~ n(x ~ x <>) 2 e ~ n (v-yo)\ ( 5 . 21 ) 


surface density 
function 

two-dimensional 
Dirac delta 
function 


where C is a constant to be determined in such a way as to make the integral 
of S n over the entire xy-plane equal to one. A simple calculation will show 
that C = n/7 t. This constant is simply the product of two “one-dimensional 
constants”: one for the exponential in x and the other for the exponential in y. 
This is as expected, because 8 n {x — Xo, y — yo) is defined to be the product of 
two one-dimensional 8 n ’s. Such a simplicity is the result of the coordinate sys¬ 
tems we have used and does not prevail in other—non-Cartesian- coordinate 
systems, for which the constant C must be evaluated separately. 

It should be clear from (5.21) that as n increases, the height of 8 n at 
(xo, Vo) increases while its width decreases (see Figure 5.5). What may not be 
clear is that this reciprocal behavior takes place in such a way as to keep the 
volume under the surface equal to one. We can define—as we did in the one 
dimensional case—a surface density function as a function whose integral 
over the entire plane is one. For any n, then, 8 n will be a surface density 
function. 

The passage to the two-dimensional Dirac delta function is now clear: 

S(x - x 0 ,y - yo) = lim S n (x - x 0 ,y - yo)- (5-22) 

n —too 


The two-dimensional Dirac delta function above is zero everywhere except at 
(. Xo,yo ) where it is infinite. Thus for the Dirac delta function not to be zero 
both of its arguments must be zero. It is convenient to define points P and Pq 



Figure 5.5: As n gets larger and larger, the two-dimensional Gaussian exponential 
approaches the two-dimensional Dirac delta function. For the left bump, n = 400; for 
the middle, n = 1000; and for the right spike n = 4000. 
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with respective Cartesian coordinates (x,y) and (xo,yo), and position vectors 
r = ( x,y ), r 0 = {x 0 ,y 0 }, and write 


5{x -x 0 ,y- yo ) = <5(r - r 0 ) 


5(0) = 5(0, 0) = oo if r = r 0 , 
0 otherwise. 


(5.23) 


This means 


Box 5.2.1. The two-dimensional Dirac delta function is zero everywhere 
except at the point which makes both of its arguments zero, in which case 
the two-dimensional Dirac delta function is infinite. 


We noted above that in Cartesian coordinates—and only in Cartesian 
coordinates—the product of two one-dimensional S n ’s gave rise to a two- 
dimensional S n which subsequently yielded the two-dimensional Dirac delta 
function. Thus only in Cartesian coordinates can we conclude that 

5(r - r 0 ) = S(x -x 0 ,y- yo) = H x ~ x o) % - Vo)- (5.24) 


We shall see that in polar coordinates, the two-dimensional delta function 
is not merely the product of two one-dimensional delta functions, but some 
other factor is also present. 

The density property of the two-dimensional Dirac delta function survives 
the n —•> oo process because the integral of 6 n is independent of n. On the 
other hand, the delta function is zero everywhere except at the point which 
makes both of its arguments zero. Therefore, for any two-dimensional region 
12, we have 



5(r - r 0 ) da(r) 


1 if Po is in fl, 
0 otherwise. 


(5.25) 


Equation (5.25) is written independently of coordinates, and as such, the 
vector arguments are to be interpreted as coordinates not components. We 
can use this equation in polar coordinates to write the two-dimensional Dirac 
delta function as a product of two one-dimensional delta functions. First 
write 2 

5(r - r 0 ) = C6(p - po)5{ip - (p 0 ). 


2 We use p and ip instead of the more common r and 6 because we have reserved the 
latter for the three-dimensional spherical coordinates. There is no danger of confusing the 
pair (p, ip) with the corresponding pair in cylindrical coordinates because the two pairs are 
identical. 
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Now substitute this in Equation (5.25) with Q being the entire plane, and 
note that da = p dp dip: 

1 =J C5{p-p 0 )S(ip-ipo) pdpdip 

nC O /*27T 

= C S(p-po)pdp 5(ip — ipo) dip 
Jo Jo 

V -v-' 

=1 

= c Po => c=—. 

Po 

In the above derivation, we have used properties of the one-dimensional delta 
function as applied to 6(p — po) and 6(ip — ipo). 


surface density 
and 

two-dimensional 
delta function 


Box 5.2.2. The two-dimensional Dirac delta function can be written in 
polar coordinates as 

c5(r - r 0 ) = — S(p - p 0 )S(p - po) = -S(p - Po)S(<P - Po )■ (5.26) 

Po P 


The last equality follows because the Dirac delta function in p forces p and 
po to be equal. 

A collection of point physical quantities Qi, Q 2 , • • •, Q n located on a sur¬ 
face can be described by a surface density crQ(r) using the two-dimensional 
Dirac delta function: 

n 

cqM = ^2QkS(r -r k ), (5.27) 

k =1 

where is the position vector of Qk- This equation can be rewritten as 

n 

a Q (x, y) = Y^Qk S(x - x k )8(y - y k ) 
k =1 

in Cartesian coordinates, and as 


o-q(p, <fi) = ^2 — 6(p - p k )S(ip - p k ) 

k =1 Pk 


1 

P 


n 

Y Qk 8(p - pk)8(v - <Pk) 

k =1 


in polar coordinates. 

Example 5.2.1. With an appropriate choice of the origin and the axes of a Carte¬ 
sian coordinate system, the surface charge density for four charges 51 , 52 , 53,54 lo¬ 
cated at the four corners of a square of sides 2 a can be written as 


4 

o q (x,y) = ^2q k 5(x-x k )5(y-yk) 

k= 1 

= 51 5(x — a)5(y — a) + q25(x + a)S(y — a) 

+ 53 <5 (a; + a)5(y + a) + qi5(x — a)6(y + a). 
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If polar coordinates are used, the surface charge density becomes 

4 

°~q{p, <f) = ~ V>k) 

hi pk 

— _ V ^ a ) {gi S(p - tt/4) + q 2 S(ip - 37r/4) 

+ qsS(ip — 57r/4) + <24<5 (v? — 77t/4)|. 

The reader is urged to study these two equations carefully and make sure to under¬ 
stand the details of their derivation. g 

A more interesting example is the two-dimensional ionic crystal. 

Example 5.2.2. Suppose positive and negative charges ±q are arranged on an 
infinite square grid in such a way that the nearest neighbors of each charge have 
charges of opposite sign, i.e., charges alternate both horizontally and vertically (see 
Figure 5.6). Assume that the distance between each charge and its nearest neighbor 
is a, and that we place our Cartesian origin at the location of a positive charge. 
Then the surface charge density can be written as 

OO OO 

&q(x,y)=q Y Y {- l Y +3 &{x-ia)5{y-ja). 

i= — oo j= — oo 

For a finite 2 M x 2 N grid one substitutes the first infinity with M and the second 
one with N. Similarly, one can consider rectangular units of sides a and b for the 
grid. Then one should change the second argument of the delta function (or the 
argument of the delta function corresponding to y) to y — jb. g 

With an extra dimension at our disposal, we can invent many new vari¬ 
eties of distribution of point physical quantities that were not possible in one 
dimension. For example, we can put the points on a curve in the rry-plane. 
It is instructive to find the surface density of such a collection of points. The 
following example examines this problem. 


I I I I I I I I I 

I I I I I I I I I 


•—c 

r>> t 

n 

□ 

□ 

n 

k ^ 

□ 

rr 

T-" 

* y. 

j \ 

P 

j * 

l z' 

* ^ 

J \ 

t l; 

o—« 

n 

i 


P 

f 

P 

i 

)_ 

p: 

i i 


I I I i i I I I I 

I I I i i I I I I 


Figure 5.6: A two-dimensional ionic crystal. 


two-dimensional 
ionic crystal 





158 


Dirac Delta Function 



Figure 5.7: Point charges located on a curve in the xj/-plane. 


Example 5.2.3. For concreteness, we consider n point charges located at n points 
{Ffc}fe =1 with P k having Cartesian coordinates (xk,y k )- These points are assumed 
to be on a curve with the Cartesian equation y = f(x) as shown in Figure 5.7. The 
surface charge density in Cartesian coordinates becomes 

n n 

<Xq(x,y) = ^2q k S(x - x k )S(y - y k ) = ^2q k 5(x - x k )5[y - /(a;*,)). 

k= 1 k=1 

If the curve is given as p = g(p), then polar coordinates are more appropriate, 
and the surface charge density will be 3 

n n 

°q{p, p) = — S (p - Pk) s (P -<Pk) = Y^ ~r ~T S (p - 9{Pk)) S(p - Pk)- 

k= 1 pk k=1 Q'-Vk) 

For instance, if the charges are located on a circle of radius a each separated from 
its nearest neighbor by an angle a, with the first charge on the z-axis, then 

<rq(p, P) = ——— q k S(p - (k - l)a), 

a z —' 

k =1 

where we have used the fact that g{p) = a for a circle of radius a. _ 

All the properties of the delta function can be generalized to two dimen¬ 
sions. One important property is given in Equation (5.10). 


Box 5.2.3. Let tt be a region in the xy-plane and Pq a point there; then 

l /(r) J(r - p „)da=l }M S /( *o, w) '> P » “ " (5.28) 

Jin I 0 otherwise, 

where ( Xo,yo ) are the Cartesian coordinates of Pq. 


3 Because of the two delta functions, one can substitute p for p & and ip for in the 
denominators. 
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Differentiating both sides with respect to the first argument Xq, we easily 
obtain the analog of Equation (5.12): 

—<9i/(r 0 ) = -d 1 f(x 0 , 2 /o) if P 0 is in SI, 

0 otherwise, 

with a similar relation for differentiation with respect to the second argument. 
We can combine the two relations into a single relation: 


£ f( r )diS(r — r 0 ) da 


Box 5.2.4. The derivative of the Dirac delta function in two dimensions 


satisfies 

f(r)diS(r - r 0 ) da = 


-dif( r 0 ) = - dif(x 0 ,yo) 
0 


where i can be 1 or 2, d\ = d x and 82 = d y . 


if Po is in f2, 
otherwise, 


5.3 Three-Variable Case 

Once the generalization to two variables is realized, the three—and more— 
variable cases become trivial. In fact, we had such generalizations in mind 
when we wrote most of the formulas in the last section: All that is needed 
is to change da to dV and keep in mind that the vectors r and ro have 
three components, and points in space have three coordinates. Nevertheless, 
we shall summarize the most important properties of the three-dimensional 
Dirac delta function. 

First we note that 

«(r-r o ) = ( i(5)sS(0 ’ 0 - 0) = ” itr = r »’ (5.29) 

I 0 otherwise. 

This means 


Box 5.3.1. The three-dimensional Dirac delta function is zero everywhere 
except at the point which makes all three of its arguments zero in which 
case it is infinite. 


In Cartesian coordinates, we have 

d(r - r 0 ) = 5(x -x 0 ,y- 2 / 0 , z - z 0 ) 

= 6 (x — xo) S(y — 2 / 0 ) &{z — zq). (5.30) 


3D Dirac delta 
function in 
Cartesian 
coordinates 
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Dirac Delta Function 


3D Dirac delta 
function in 
cylindrical and 
spherical 
coordinates 


An argument similar to the two-dimensional case can be used to show that 


Box 5.3.2. In cylindrical coordinates 

d'(r - r 0 ) = —S(p - po)S(ip - ip 0 )6(z - z 0 ), (5.31) 

Po 

where r and ro on the LHS are to he understood as cylindrical coordi¬ 
nates, not cylindrical position vectors. The corresponding formula for the 
spherical coordinate system is 

<5(r - r 0 ) = 1 S(r - r 0 )6(9 - 9 0 )S(p - ip 0 ), (5.32) 

r o sin 9 0 

with r and ro representing the coordinates ( r,9,tp ) and (r o,9o,<po), re¬ 
spectively. 


by 


The density property of the three-dimensional Dirac delta function is given 



(5(r - r 0 ) dV( r) 


1 if P 0 is in Q, 
0 otherwise, 


(5.33) 


where O is a region of space and Po is the point with Cartesian coordi¬ 
nates (xo, 2/o, Zo), spherical coordinates (ro, 9o, tpo), and cylindrical coordinates 
(po,Vo,Zo)- Similarly, 


Box 5.3.3. If ft is a region of space, then for a “good” function /( r), 

\a\/( \ if Po is inti, 

f (r)^(r — r 0 ) dV (r) = ) 


1 ° 


otherwise. 


Thus integration reduces to the evaluation of the function f at the coordi¬ 
nates of Pq . 


volume density 
and 3D delta 
function 


The density property allows us to write the distribution of discrete physical 
quantities in terms of the three-dimensional Dirac delta function. In general, 

n 

PQ( V ) =^Qkb{r-* k ) (5-34) 

k =1 


which can be rewritten as 


n 

Pq(x, y, z) = T] QkS(x - x k )S(y - y k )S(z - z k ) 
k =1 
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in Cartesian coordinates, as 

Pq(p, = ~ S (P - Pk)S{<p ~ <Pk)$(z - z k ) 

P k 

in cylindrical coordinates, and as 

" O 

PQ(r, 0,<p) 2~~~n s ( r ~ r k)S(9 - 0 k )5(ip - (fi k ) 

£ri n sm °k 

in the spherical coordinate system. In fact, the linear and surface distributions 
of a physical quantity involving the Dirac delta function are special cases of 
the volume distribution. For instance, a collection of point quantities in the 
rry-plane can be described by the volume density 

n 

P Q (x , y,z) = Y^ QkS(x - x k )S(y - y k )S(z) 
k= 1 

n 

= S(z) ^2 Qk$(x - x k )S(y - y k ). 

k =1 

The delta function outside the sum restricts the ^-coordinates of point quan¬ 
tities to zero, and thus their location, to the ccy-plane. Similarly, 

p Q {r, 9, <p) = 2 Y 6(9 - 9 k )S(ip - ip k ) 

cl sin u k 

k= 1 

describes a distribution of n point quantities on a sphere of radius a. 

Example 5.3.1. Let us calculate the electrostatic field of the one-dimensional 
infinite ionic crystal in Cartesian coordinates. Assume that the charges are located 
on the z-axis (Figure 5.8). We treat this as a three-dimensional charge distribution 
with density 

OO 

p q (x,y,z)=q Y (-1 ) k S(x)5(y)S(z- ka). (5.35) 



Figure 5.8: The geometry for the calculation of the electrostatic field of the one¬ 
dimensional ionic crystal. 
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The first two delta functions restrict the charges to the z-axis and the third locates 
them. This density is to be substituted in the equation for the electric field in 
Cartesian coordinates. Let us concentrate on the ^-component 


E x (x,y,z) 


— ke 


Pq(x',y',z')(x 


n {(* — x') 2 + (y — 


— x') dx' dy 1 dz' 
y') 2 + (z - z') 2 } 3/2 ' 


We can always take Q to be the entire space because the delta function will restrict 
the integration to the region of charges automatically. We can also choose our 
coordinate system so that the held point lies in the rrz-plane, i.e., y — 0. Note that 
we have to prime all the arguments of p q before we substitute it in the integral. 
Having done this, we obtain 


E x (x,y,z) 


k e q 


E (-E 


(x — x')S(x')S(y')S(z' — ka) dx' dy' dz' 
{(* - x') 2 + y' 2 + (z - z ') 2 } 3/2 


Using Box 5.3.3, noting that 

f(x',y',z) = 


(x — x') 


{(* — x') 2 + y' 2 + (z — z') 2 } 3/2 ' 


and that the result of integration is the evaluation of / at x' = 0 = y', z' = ka, we 
obtain 


E x (x,y,z) = k e q (-1)* 


k= — o> 
-1 


= k e q J2 (- 1 )* 


+ k e q 


(x 2 + z 2 ) 3 / 2 


{x 2 + {z — ka) 2 } 3 ' 2 


{x 2 + (z — fca) 2 } 3/2 


+ k e q Y (-l) fc --- 

{x 2 + (z-ka) 2 } 3/2 


where we have broken up the summation into three pieces, a permissible act as long 
as the series converges. We can combine the first and third terms by changing k to 
—k in the first and noting that 


ME = = (- 1 )*- 


Doing so, we get 
E x (x, 0, z) = k e q 


(x 2 + z 2 ) 3 / 2 

OO 

+ k e q E( _1 ) fc 


fc =i y{x 2 + (z + ka) 2 } 3/2 {x 2 + {z - ka) 2 } 3 ' 2 ) ' 

The other components of the held can be found similarly: 

E y (x,0,z) = 0, 


E z (x,0,z) = k e q 


(x 2 + z 2 ) 3 / 2 

k I _ z + ka __ 

k=1 V i x2 +( z + ka) 2 } 3 ' 2 {x 2 + (z- ka) 2 } 3 ' 2 ) ' 


(5.36) 


+ ke q y(-l) k 


z — ka 
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Let us further simplify the problem by positioning the field point on the *-axis, 
i.e., setting z = 0. This reduces the above expressions to 


E x (x, 0,0) = k e q-r-^ +2k e q^2(-l) 

' X ' k= 1 

Ey{x, 0,0) = 0, 

OO 

E z (x, 0,0) = k e q^2(- 1) 


(* 2 + fc 2 a 2 ) 3 / 2 : 


ka 


+ 


—ka 


k=1 v {x2 + (fca) 2 } 3/2 {x2 + { _ ka)2} 3/2 

At a distance a from the origin on the *-axis, the field strength is 


= 0. 


E x (x, 0,0) = ^<{1 + 2^ 


(1 + fc 2 ) 3 / 2 


- 0.286269 


= 0.42746 


k e q 


E y (x, 0, 0) — 0, 
E z (x,Q,Q) = 0, 


where the numerical value for the sum—accurate to six decimal places—is obtained 
by adding its first 150 terms. 

Another useful quantity is the electrostatic potential which for an arbitrary 
charge distribution is given by 


$(r) = 


dq(r') 
|r — r'l 


(5.37) 


For the one-dimensional crystal, with the volume charge density of Equation (5.35), 
the electrostatic potential at an arbitrary point (*, y, z) in space becomes 


$(x,y,z) = fc e J 


p q (x ', y ', z 1 ) dx' dy' dz' 
n sj (* - x') 2 + (y - y') 2 + (z - 
k If S(x')5(y')S(z' 


= m Y 


k= — oo 


V( x - x') 2 + (y 


z') 2 

- ka) dx' dy' dz' 

- y ') 2 + (« - z ') 2 


oo 

= k e q Y 

k= — oo 


i Jx 2 + y 2 + (z — ka) 2 


If we are interested in the potential at a specific point such as (*,0,0), the 
expression simplifies to 


$(*, 0 , 0 ) 


OO 

k e q Y 

k= — oo 


(~i) fc 

V* 2 + k 2 a 2 


keq—j= +2 k e qY 

v* rr 


%/* 2 + k 2 a 2 


k e q 


+ 2 k e q "y ) 

k=l 


(~l) fc 

V* 2 + k 2 a 2 
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For x = a, this further simplifies to 


$(a,0, 0) = 1 + 2V \ 1] 1 = 0.1182^. 

a X a + k ' 2 i a 

= - 0.4409 

We note that the potential is positive, because the field point is closest to the positive 
charge at the origin. To obtain the numerical value of the sum accurate to only four 
decimal places, we have to add at least 40,000 terms! This sum is, therefore, much 
less convergent than the sum encountered in the evaluation of E x above. _ 


An important physical quantity for real crystals is the potential energy U 
of the crystal. Physically, this is the amount of energy required to assemble 
the charges in their final configuration. A positive potential energy corre¬ 
sponds to positive energy stored in the system, i.e., a tendency for the system 
to provide energy to the outside, once disrupted slightly from its equilibrium 
position. A negative potential energy is a sign of the stability of the system, 
i.e., the tendency for the system to restore its original configuration if dis¬ 
rupted slightly from its equilibrium position. 4 It is shown in electrostatics 
that the potential energy of a system located within the region Q is 

U=^£dq( r)$(r). (5.38) 

electrostatic 
potential energy of 
a one-dimensional 
crystal 


Example 5.3.2. Let us calculate the electrostatic potential energy of the one¬ 
dimensional crystal. Let us assume that there are a total of 2N+1 charges stretching 
from 2 = — Na to 2 = +JVa with a positive charge at the origin. Eventually we 
shall let N go to infinity, but, in order not to deal explicitly with infinities, we 
assume that N is finite but large. Substituting in (5.38) the element of charge in 
terms of volume density, and electrostatic potential found in the previous example, 
we find 


u =n p q (x,y,z)$(x,y,z)dxdydz 


= \jfi q i- 1 ) 3 S{x)5{y)5{z-ja) 


j~ — N 
N 


k e q ^2 


(-1) A 


fcfrV V x2 + v 2 + ( z ~ ka ) 2 
= W A A (-!)'+* 

2 i^Nk^N VUa-ka) 2 ' 


dx dy dz 


j=-N k= — N 


4 A system that has negative potential energy requires some positive energy (such as 
kinetic energy of a projectile) to reach a state of zero potential energy corresponding to 
dissociation of its parts and their removal to infinity (where potential energy is zero). 
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The restriction k ^ j is necessary, because the k = j terms correspond to the 
interaction energy of each charge with itself, and should be excluded. Continuing 
with the calculation, we write 


^E E 


j=—N k= — N 


(-1 y +k 

I j - k \ 


k e q 2 
2 a 




(-1 ) j+k 
j -k 


N 

+ E 

k=j +1 


k-j j 


In the first inner sum, let j — k = m, and in the second let k — j = m. These 
substitutions change the limits of the sums, and we get 


U = 


k e q 

2a 


= ¥- E {E 


2 N 

2 N (N+j 


E E 


( -I \ 2 j—m N _ P 

{ —— + E ■ 

m — J 


j= — N \ m=N-\-j 


2a 


j= — N y m=l 


(-ir 


N-j 

E 


(-i)" 



To evaluate the inner sums, denoted by S, we now assume that N is very large— 
compared to j —so that N — j k N ~ N + j. Then the inner sum yields 5 


j , . \ m N j , 1 \ Tl 

s = y -—— + y ^ 

' m z — 1 ' m 

m=1 m= 1 


y (-1 ) m ! y (-1 ) r 

' m ' m 


2E 


(-ir 


-2E 


(-i) 


m= 1 
m+1 


= -2 In 2. 


Substituting S in the expression for U, we get 


U 


k e q 2 
2 a 


N 


E (-2 In 2) 

j=-N 



N 

In 2 E 1 

j=~N 


— (2N + 1)~L In 2. 


The negative sign indicates that the one-dimensional salt crystal is stable. A useful 
quantity used in solid-state physics is ionization energy per molecule which is defined 
to be the potential energy divided by the number of molecules. Noting that the 
number of molecules is half the number of particles, we obtain 


u = U/N = - 


2 N + 1 k e q 2 
N a~ 


In 2 « - 


k e q 2 


2 In 2 = —a 


k e q 


A real three-dimensional salt crystal has exactly the same expression. However, 
the constant a, called the Madelung constant has the value of 1.747565 instead 
of 2 In 2 = 1.386294. (See Problem 5.17 for an alternative way of calculating the 
potential energy of the one-dimensional ionic crystal.) g 


5 We are really cheating here! The sum over j indicates that j can assume values close to 
N , and therefore, the approximation is not valid for such j’s. However, a careful analysis, 
in which one breaks up the sum over j and separates large and small values of j, shows that 
the original approximation is valid as long as N is large enough. 


Madelung 

constant 
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5.4 Problems 


Dirac Delta Function 


5.1. Plot the distribution on the real line of each of the following electric 
linear charge densities: 

(a) A(x) = S(x — 2). (b) A(x) = —6(x + 1). 

(c) A(x) = 55(x) — 3S(x + 3). (d) A(x) = S(x + 1) + 3 S(x — 1). 


5.2. Evaluate the following integrals: 


p oo p 2 

(a) / e x sin ^-S(x 2 — 1) dx. (b) / e x sin ^-S(x 2 — 1) dx. 

Jo 2 J —2 2 


(c) J e x sin -^-5(x 3 + 1) dx. 


(d) [ sin + 1) dx. 


—oo 
oo 


poo pOO 

(e) / sin _1 (l/x)d(x 4 — 1) dx. (f) / cos(nx)5(6x 2 ~ x — l) dx. 
J 0 J —oo 


7re 


/ c_ 

e* sin <5(e x sin ^-) dx. 

-O*"' " 


J-0.1 

(i) [ e s[nx S(cosx)dx. 
Jo 


' —oo 

/•OO 


7T2\ 

T 


0) / sin" 
./0 


<5(x 4 — 4) dx. 


/ oo /»oo 

e x sin <5(4x 2 — 1) dx. (1) / ln(l + x) sin ^-5(x 3 — 1) 
-oo ^ J — OO " 


— 1) dx. 


' — OO 

/»oo 


7re 


(m) / sin -2A—<5(x 3 + 1) dx. 


5.3. Show that 


/ +oo 

f(x)S'(x - x 0 ) dx = -f'(x o) 

-OO 


and 


r + OO 


/ +oo 

f'( x )Hg( x )) dx 

-oo 


5.4. Evaluate the following integrals: 

/‘OO p2 

(a) / sin ^-5'(x 2 — 1) dx. (b) / sin <5'(x 2 — 1) dx. 

Jo 2 J _2 2 

pOO poo / x \ 

(c) J e x sin ^-<5'(x 3 + 1) dx. (d) J sin ( J d'(x 4 + 1) dx. 

/»oo p OO 

(e) / sin^ 1 (l/x)d'(x 4 — 1) dx. (f) / cos(7rx)d , (6x 2 — x — 1) dx. 

J 0 J — oo 
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(g) J sin (^-^J 5'(x 2 + x) dx. (h) J e x sin ^-6'(e x sin dx. 


(i) / e sinx J'(cosx)dx. 


0) 


sm 


-l 


2 v 2 
' 6'(x 4 — 4) dx. 


/ oo „ roo 

e x sin ^-5’ (4a; 2 — 1) dec. (1) / ln(l + x) sin ~^-5'(x 3 — 1) dx. 

- OO ^ J — OO " 

/ °° 7TP X 

sin —— d'(:r 3 + 1) dec. 

-OO " 

5.5. Use integration by parts (or differentiation with respect to Xq) to show 
that 

f + OO 


and 


and, in general, 


/ -t-oo 

f(x)5"(x - x 0 ) dx = f"(x 0 ) 

-OO 

/ +oo 

f(x)5'"(x - ec 0 ) dx = -f"\x 0 ) 

-OO 


/ +oo 

f(x)6 ( - n \x-xo)dx=(-l) n f^\x 0 ) 

-OO 


where 5^ and /(") represent the nth derivatives. 

5.6. Derive Equation (5.16). Hint: Use the result of Problem 5.3. 

5.7. Six point charges of equal strength q are equally spaced on a circle of 
radius a. What is the volume charge density describing such a distribution in 
cylindrical coordinates? 

5.8. Convince yourself that 


a q{x, y) = q (-1 y +3 S(x-ia)S(y-ja) 

i =—oo j =—oo 

indeed describes a two-dimensional ionic crystal. Pay particular attention to 
the power of (—1). 

5.9. Derive Equations (5.31) and (5.32). 

5.10. Plot (or describe) the distribution in space of each of the following 
volume charge densities: 


P q (x, y, z) = S(x)S(y) {2 S(z) - 3d(2 + 3)} , 

P q (x, y, z) = 5d(a: + 1)<% - 1) {d(2 - 1) - S(z + 1)} 
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Pq(p, <P, z) = -2 S(p - 3)^0 - 7 t)S(z), 


10 


Pq(p, V, z) = 2 5((f - 7r/4)5(z) < ^(-l) fe+1 <S(p - 0.5*;) 


,fe=i 


( 10 

Pq(r, 0, <p) = 2 %> - 7r/4)5(r - 2) ^(-l) fc+1 <5 (<9 - 

U=i 
r 20 

Pg(^ 6>, = 2J(6» - tt/4 )S(r - 2) i ^(-l) fc+1 <5 (y - 


5.11. Derive Equation (5.36). 

5.12. Plot 0{t)9{ 1 — t), 9{t) — 9(—t), and 9{t 2 + 1) for — oo < t < +oo. 

5.13. Write 9{t 2 — 1) as a product of two step functions. 

5.14. For the two-dimensional ionic crystal shown in Figure 5.6: 

(a) write the volume charge density describing the distribution (charges are 
in the rry-plane); 

(b) calculate the electrostatic field at (0,0, a); and 

(c) calculate the electrostatic potential at an arbitrary point in space with 
coordinates ( x,y,z ). 

(d) Show that the ionization energy is of the form — ak e q 2 /a with a given in 
terms of a sum. 

(e) Numerically evaluate a. 

5.15. For the three-dimensional ionic crystal: 

(a) write the volume charge density describing the distribution; and 

(b) calculate the electrostatic potential at an arbitrary point in space with 
coordinates (x,y,z). 

(c) Show that the ionization energy is of the form —ak e q 2 /a with a given in 
terms of a sum. 

(d) Numerically evaluate a. 

5.16. Two electric charges +q and — q are located at Pi and P 2 with position 
vectors iq and r-2- 

(a) Write the volume charge density describing these charges. 

(b) Use (a) to find their dipole moment defined by fff r'dq(r'). 

5.17. The electric charge density of the one-dimensional ionic crystal can be 

written as p{ r) = Yh=-n - r 0- 

(a) Substitute this in Equation (5.38) and get 
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(b) Assuming that N is very large (infinite), convince yourself that all products 
gi<l>(ri) in the sum are equal (in particular the sign of the charge does not 
matter). Therefore, U = ^(2N + l)q 0 <i>(ro), where the subscript denotes the 
zeroth charge. 

(c) Show that 4>(r 0 ) = Eyl-jv k e<ljl\ r j ~ r o|- 

(d) Place the origin at the location of the zeroth charge, and assume that the 
this charge is positive. Then, r 0 = 0, r j — jae z , and qj = — (—l) J g. Now 
show that 


N 


U = -{N+\)q 2 k e Y, 

j=-N 


tiy 

I oil 


(e) By breaking up the sum into two parts show that 


U = -(2N+l)^y 

a u 


(-i y 

j 


5.18. 2N charges of equal sign and magnitude q are arranged equally spaced 
on a circle of radius a located in the a;y-plane. Assume that the charge num¬ 
bered 2 N is at (a, 0,0). 

(a) Write the volume charge density of such a distribution in cylindrical co¬ 
ordinates. 

(b) Starting with an integral expression for the electric held, find the cylin¬ 
drical components of the held at an arbitrary point P in space in terms of 
a sum. The coordinates of P are (p,ip,z). Simplify your answer as much as 
possible. 

(c) Now let P have coordinates (2a, 0,0). Show that all components of the 
held are of the form ( k e q/a 2 )a . Express the a for each component in terms 
of a sum. What do you expect the value of a to be? Can you find that value? 

(d) For N = 3, i.e., six charges, calculate the numerical value of a in part (c) 
for all components. 

5.19. 2N + 1 charges of equal sign and magnitude q are arranged on the x- 
axis of a Cartesian coordinate system as shown in Figure 5.9, with the zeroth 
charge at the origin. The numbers below the axis are labels of the charges, 
(a) From the pattern of the figure, determine the location of the kth charge 
for —N < k < N. 

I-9 a - 1 - 9 a - 1 

l —4o —l— 4 a —i 

_ a a_ _ 

3 2 10-1-2 -3 x 


Figure 5.9: The charges and their distances on the a>axis. 
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(b) Write a volume charge density in terms of the Dirac delta function de¬ 
scribing such a charge distribution. 

(c) Calculate the components of the electric field at a general point P with 
coordinates ( x,y,z ). 

(d) Now let P have coordinates (a, a, 0). Show that all components of the 
field are of the form ( k e q/a 2 )a where a is a numerical factor. Find this factor 
for each component. 

5.20. 2TV positive and negative charges of equal magnitude are arranged 
equally spaced and alternating in sign on a circle of radius a. 

(a) Write the expression of the volume charge density describing this charge 
distribution. 

(b) Find the ionization energy in the form —ak e q 2 /a with a given in terms 
of a sum. Simplify this sum as much as possible. 




Part II 

Algebra of Vectors 



Chapter 6 

Planar and Spatial Vectors 


The preceding chapters made heavy use of vectors in the plane and in space. 
The enormous utility of the concept of vectors has prompted mathematicians 
and physicists to generalize this concept to include other objects that at first 
glance have no resemblance whatsoever with the planar and spatial vectors. 
In this chapter, we shall study this generalization in its limited form, i.e., 
only in an algebraic context. Although the analysis of vectors is discussed 
in Chapters 12 through 17, it is confined to vectors in space. The analysis 
of generalized vectors is the subject of differential geometry and functional 
analysis that are beyond the scope of this book. 1 

There are many mathematical objects used in physics that allow for the 
two operations of addition and multiplication by a number. The collection 
of such objects is called a vector space. Thus, a vector space is a bunch 
of “things” having the property that when you add two “things” you get a 
third one, and if you multiply a “thing” by a number you get another one of 
those “things.” Furthermore, the operation of multiplication by a number and 
addition of “things” is distributive, and a vector space always has a “thing” 
that we call the zero vector. 

Using the two operations of multiplication by a number and addition, we 
can form a sum, 


Oiai + 02^2 + ' ' ' + CH n EL n , (6.1) 

where chi, a 2 , ■ ■ ■, a n , are real numbers and ai,a 2 ,...,a n are vectors. The 
sum in Equation (6.1) is called a linear combination of the n vectors and 
a\, « 2 )..., ot n are called the coefficients of the linear combinations. 


vector spaces 
defined 


linear combination 
coefficients 


1 Hassani, S. Mathematical Physics: A Modem Introduction to Its Foundations , 
Springer-Verlag, 1999, discusses differential geometry and functional analysis in some detail. 



174 


Planar and Spatial Vectors 


polynomials as 
vectors? 


proof of the fact 
that any three 
vectors in the 
plane are linearly 
dependent 


Box 6.0.1. If we can find some set of real numbers, a\, a 2 ,..., a n (not 
all of which are zero), such that the sum in (6.1) is zero, we say that the 
vectors are linearly dependent. If no such set of real numbers can be 
found, then the vectors are called linearly independent. 


6.1 Vectors in a Plane Revisited 

Before elaborating further on the generalization of vectors and their spaces, 
it is instructive to revisit the familiar vectors in a plane from a point of view 
suitable for generalization. We first discuss the notion of linear independence 
as applied to vectors in the plane. 

The two vectors e x and e y (sometimes denoted as i and j) are linearly 
independent because ae x + (3e y = 0 can be satisfied only if both a and (3 are 
zero. If one of them, say a, were different from zero, one could divide the 
equation by a and get 

. _ V. 

6^ —-671 

a v 

which is impossible because e^, and e y cannot lie along the same line. 

Example 6.1.1. The arrows in the plane are not the only kinds of vectors dealt 
with in physics. For instance, consider the set of all linear functions, or polynomials 
of degree one (or less), i.e., functions of the form ao + ait where ao and ai are real 
numbers and t is an arbitrary variable. Let us call this set CP 1 [t], where 7 stands for 
“polynomial,” 1 signifies the degree of these polynomials, and t is just the variable 
used. We can add two such polynomials and get a third one of the same form. We 
can multiply any such polynomial by a real number and get another polynomial. I 11 
fact, CPi[t] has all the properties of the vectors in a plane. We say that CPi[t] and 
the vectors in a plane are isomorphic which literally means they have the “same 
shape.” 

It is important to emphasize that two polynomials are equal if and only if all 
their coefficients are equal. In particular, a polynomial is equal to zero only if it is so 
for all values oft, i.e., only if its coefficients vanish. This immediately leads to the 
fact that the two polynomials 1 and t are linearly independent because if a + fit = 0 
(for all values of t), then a = (5 = 0 (try t = 0 and t = 1 ). g 

It is easy to show that any three vectors in the plane are linearly dependent. 
Figure 6.1 shows three arbitrary vectors drawn in a plane. From the tip of one 
of the vectors (a 3 in the figure), a line is drawn parallel to one of the other two 
vectors such that it meets the third vector (or its extension) at point D. The 
vectors OD and DC are proportional to ai and a 2 , respectively, and their 
sum is equal to a 3 . So we can write 

a 3 = OD + DC = Q-ai + /3a 2 a-ai + /3a 2 — a 3 = 0 

and ai, a 2 , and a 3 are linearly dependent. Clearly we cannot do the same 
with two arbitrary vectors. Thus 
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Figure 6.1: Any three vectors ai, a2, and a3 in the plane are linearly dependent. 


Box 6.1.1. The maximum number of linearly independent vectors in a 
plane is two. Any vector in a plane can be written as a linear combination 
of only two non-collinear (not lying along the same line) vectors. 


We also say that any two non-collinear vectors span the plane. 

Suppose that we can write a vector a as a linear combination of n vectors 


a — aiai -I- (X 2&-2 T • • • -I- oc n a n . 

We want to see under what conditions the coefficients are unique. Suppose 
that we can also write 


a — (3\&1 + /?2 a 2 + • • • + /3n a n, 

where the /3’s are different from the a’s. Then, subtracting these two linear 
combinations, we get 


0 — (m — /3i) a i + (cr 2 — /?2 ) a 2 + • • • + ( cn n — (3 n ) a n . 

This is possible only if the vectors are linearly dependent. Therefore, if we 
want the coefficients to be unique, the vectors have to be linearly independent. 
In particular, we can have at most two such vectors in the plane. Thus, 
choosing any two linearly independent vectors and a 2 in the plane, we can 
expand any other vector uniquely as a linear combination of ai and a 2 . This 
brings us to the notion of a basis. 


Box 6.1.2. Vectors that span the plane and are linearly independent are 
called a basis for the plane. 


The foregoing argument showed that any two non-collinear vectors form a 
basis for the plane. 

With the notion of a basis comes the concept of components of a vector. 
Given a basis, there is a unique way in which a particular vector can be written 


basis defined 
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components and 
dimension 

Example 6.1.2. The components of a 3 in the basis {ai,a 2 } of Figure 6.1 are 
(a, (5). 2 Given any basis {ai, a 2 } of the plane, it is readily seen that the components 
of ai are (1, 0) and those of a 2 are (0,1). ■ 

Example 6.1.3. The polynomials {1,t} form a basis for CPi[t], because they are 
linearly independent and they span CPi [t]. Therefore Ti[t] is a two-dimensional 
vector space. The components of f = ao + aif are (ao, ai) in this basis. How do 
we determine the components of f in another basis {ai,a 2 } with ai = 1 + t and 
a 2 = 1 — t? Since {ai,a 2 } is a basis, we can write 

f = *iai + £2a2 = xi(l + t) + *2(1 — t) = ( Xl + X2 ) + (Xl — X2 )t 


in terms of the vectors in the basis. The unique coefficients of the basis vectors 
are called the components of the particular vector in that basis. Another 
concept associated with the basis is dimension which is defined to be the 
number of vectors in a basis. It follows that the plane has two dimensions. 


or 


ao + ait = (xi + X 2 ) + ( xi — X 2 )t =$■ (ao — *1 — * 2 ) • 1 + (ai — xi + X 2 )t = 0. 

The linear independence of 1 and t now tells us that the coefficients of 1 and t should 
vanish. This leads to two equations in two unknowns: 

X 1 +X 2 = a 0 , 
xi — X 2 = ai. 

The solution of these equations are easily found to be 

xi = ^(ao + ai), X 2 = ^(ao — ai). 

Thus, the components of f are (§(ao + ai), ^(ao — ai)) in the new basis. ■ 

6.1.1 Transformation of Components 

There are infinitely many bases in a plane, because there are infinitely many 
pairs of vectors that are linearly independent. Therefore, there are infinitely 
many sets of components for any given vector, and it is desirable to be able 
to find a relation between any two such sets. Such a relation employs the 
machinery of matrices. 

Consider a vector a with components ( 01 , 0 : 2 ) in the basis {ai,a 2 } and 
components (o^q:^) in the basis {a'j. a),}. We can write 

a = Qqai + 02^2 and a = aqa^ + o^a^. (6-2) 

Since {a^af,} form a basis, any vector, in particular, ai or a 2 , can be written 
in terms of them: 


01 — aii^i + 021^2, 

a 2 = a^a^ + 02202 , (6.3) 

2 Since in this chapter we are dealing primarily with components (and not coordinates), 
we shall use parentheses—instead of angle brackets—to list the components. 
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where (an, 021 ) and ( 012 , 022 ) are, respectively, components of ai and ao in 
the basis {a(,a(}. Combining Equations (6.2) and (6.3), we obtain 

ai(ona' 1 + 02132) + 02(0123^ + 022*12) = a 'i a i + o4 a 2 


or 


(cx'i — ono;i — Oi 202 )a , i + ( 0/2 — 02101 — 02202)0-2 — 0 . 


The linear independence of a( and a ' 2 gives 


Gq — Onfti + 01202, 
0 2 = O21O1 + 0220 - 2 - 

These equations can be written concisely as 3 


fo.\ \ _ /an ai2\ f «i\ 
\o' 2 ) ~ ya 2 i a 2 2 / \02 / 

where we have introduced the matrices 



or a' = Aa, 




(6.4) 


(6.5) 


( 6 . 6 ) 


matrix and 
column vector 


The matrices a and a' are called column vectors or 2 x 1 matrices because 
they each have two rows and one column. Similarly, A is called a 2 x 2 matrix. 

Let us now choose a third basis, {a", ai]}, and write a = a" a" + 0 "a”. If 
(o'ujO^ 1 ) and ( 012 , 022 ) are, respectively, the components of a( and a 2 in this 
third basis, then 


a l — a ll a l + a 21 a 2) 

a 2 = a 12 a l 4“ a 22 a 2 ’ 


Substituting these in the second equation of (6.2) and equating the result to 
a = a" a" + a ((a" yields 


a l + a 12 a 2> 


u i 

O 2 = a 21 a l 4“ a 22 a 2- 

We can write Equation (6.7) in matrix form: 


a. 


ni 

f 21 


a 


where 


and a' is as defined before. 




A' = 


or 


a ll 

°21 


= A'a' 


a 12 

a 22 


(6.7) 


( 6 . 8 ) 


(6.9) 


3 At this point, think of Equation (6.5) as a short-hand way of writing Equation (6.4). 
Further significance of this notation will become clear after Box 6.1.3. 
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two 

transformations in 
a row suggest the 
rule of matrix 
multiplication. 


active and passive 

transformations 

distinguished 


We can also discover how a" and a are related by substituting (6.4) in 
(6.7). This leads to the equation 

ol[ = (a'ii an + a' l 2 a 2 i)ai + (a' n ai 2 + a' 12 a 22 )a 2 , 
ol 2 = (a 21 an + a 22 a 2 i)ai + (a 21 ai 2 + a 22 a 22 )oi 2 , 

which, in matrix form, becomes 


= A" 


where 


f a' n aii + a' 12 a 2 i 
\a' 21 a\\ + a 22 a 2 i 


a' n ai2 + a’ 12 a 22 
a 21 ai 2 + a 22 a 22 


( 6 . 10 ) 


On the other hand, the matrix equations (6.8) and (6.5) yield a" = A'(Aa), 
which is consistent with Equation (6.10) only if matrix multiplication is 
defined so that A" = A'A, i.e., 


( a ii a' 12 \ ( a n a i2 \ = ( a'nan + ai 2 a 2 i a' n ai 2 + a' 12 a 22 

\a 21 a 22 ) \a 2 1 a 22 ) \a 21 an + a 22 a 2 i a 21 ai 2 + a 22 a 22 

All discussions and all the equations obtained so far are based on fixing 
a vector and looking at its components in different bases. However, there is 
another, more physical, way of interpreting these equations. Consider (6.5). 
Here the column vector on the RHS represents the components of a vec¬ 
tor a in the basis {ai,a 2 }. Applying the matrix A to this column vector 
yields a new column vector given on the LHS, which can be interpreted as 
the components of a new vector a' in the same basis. So, in essence we have 
changed the vector a into a new vector a' via the transformation A. The first 
interpretation mentioned above is called a passive transformation (a is 
“passively” unchanged as basis vectors are altered); the second interpretation 
is called active transformation (a is actively changed into a'). We shall 
have occasion to employ both interpretations. However, the active transfor¬ 
mation is more direct and we shall use that more often. The reader may 
convince himself or herself that passive transformation in one “direction” is 
completely equivalent to active transformation in the “opposite” direction. 
A good example to keep in mind is the rotation of axes (passive rotation) 
versus the rotation of a vector (active rotation) in the plane as shown in 
Figure 6.2. 

Equation (6.11) defines the “product” of two matrices in a prescribed man¬ 
ner. To find the entry in the first row and first column of the product, multiply 
the entries of the first row of the first matrix by the corresponding entries of 
the first column of the second matrix and add the terms thus obtained. To 
find the entry in the first row and second column of the product, multiply 
the entries of the first row of the first matrix by the corresponding entries of 
the second column of the second matrix and add the terms. Other entries 
are found similarly. This leads us to the following rule which applies to all 
matrices, not just those that are 2x2: 
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Figure 6.2: (a) A vector a in a coordinate system Oxy can be (b) actively transformed 
to a new vector a' in the same coordinate system, or (c) passively transformed to a 
new coordinate system O'x'y'. Note that the relation of a' to Oxy is identical to the 
relation of a to O'x'y'. 


Box 6.1.3. ( Matrix Multiplication Rule). To obtain the entry in the 
ith row and jth column of the product of two matrices, multiply the entries 
of the ith row of the matrix on the left by the corresponding entries of the 
jth column of the matrix on the right and add the products thus obtained. 


For this rule to make sense, the number of entries in a row of the matrix on 
the left must equal the number of entries in a column of the matrix on the 
right. 

We identified a column vector as a 2 x 1 matrix. With this identification, 
the RHS of Equation (6.5) can be interpreted as the product of two matrices, 
a 2 x 2 matrix and a 2 x 1 matrix, resulting in a 2 x 1 matrix, the column 
vector on the LHS. 

Matrices were obtained in a natural way in the discussion of basis changes, 
and the natural operation ensued was that of multiplication. Once a mathe¬ 
matical entity is created in this manner, a full mathematical structure becomes 
irresistibly enticing. For example, such operations as addition, subtraction, 
division, inversion, etc., also demand our attention. We now consider such 
operations. 

First, we need to define the equality of matrices: Two matrices are equal 
if they have the same number of rows and columns, and their corresponding 
elements are equal. Addition of two matrices is defined if they have the same 
number of rows and columns in which case the sum is defined to be the sum 
of corresponding elements. A 2 x 2 matrix can be added to another 2x2 
matrix, but a column vector cannot. Thus if 


A = ( 0,11 

ai 2 \ 

and 

rO r. 

- 

II 

00 

V«21 

a 2 2/ 


V&21 &22/ 


for the product 
rule to make 
sense, number of 
columns of the left 
matrix must equal 
number of rows of 
the right matrix. 

matrices forming a 
mathematical 
structure with 
operations other 
than 

multiplication 
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zero matrix 


identity matrix or 
unit matrix 

inverse of a matrix 


then 

an + bn ai2 + bi2 
021 + &21 a 22 + &22 

From the definition of the sum and the product of matrices, it is clear that 
addition is always commutative but product need not be: 

A + B = B + A but AByfBA. (6.12) 



We can turn the set of 2 x 2 matrices into a vector space by defining the 
product of a number and a matrix as a new matrix whose elements are the old 
elements times the number. The zero “vector” is simply the zero matrix— 
the 2x2 matrix all of whose elements are zero. The reader may verify that 
all the usual operations of vectors apply to this set. 4 If you multiply a matrix 
by the number 0, you get the zero matrix. 

Example 6.1.4. Suppose 

A= (J ~ s ) and B= (”l 1 2 


Then 


and 


A + B 


/I — 1 -l + 0\ 
\2 + 1 3 + 2 ) 



B + A 



/I - (-1) + (-1) • 1 1 • 0 + (—1) • 2\ 

V 2 • (—1) + 3-1 2-0 + 3-2 j 


while 



Clearly, AB yf BA. 


/(_!). 1 + 0-2 (—1) • (—1) + 0 • 3\ _ (—1 1\ 

V 1- 1 + 2-2 1- (—1) + 2-3 J V 5 5 ) ' 


The 2x2 matrix 



is called the 2x2 identity matrix or unit matrix, and has the property 
that when it multiplies any other matrix (on the right or on the left), the 
latter does not get affected. The unit matrix is used to define the inverse of 
a matrix A as a matrix B that multiplies A on either side and gives the unit 
matrix. The inversion of a matrix is a much more complicated process than 
that of ordinary numbers, and we shall discuss it in greater length later. At 
this point, suffice it to say that, contrary to numbers, not all nonzero matrices 
have an inverse. For example, the reader can easily verify that the nonzero 
matrix (J q) cannot have an inverse. 

4 Note that the extra operation of multiplication of a matrix by another matrix is not 
part of the requirement for the set to be a vector space. 
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We have introduced 2x2 and column (or 2x1) matrices. To complete 
the picture, we also introduce a row vector, or a 1 x 2 matrix. The rule of 
matrix multiplication allows the multiplication of a 2 x 2 matrix and a column 
vector, as long as the latter is to the right of the former: You cannot multiply 
a 2 x 1 matrix situated to the left of a 2 x 2 matrix. Similarly, you cannot 
multiply two 2x1 matrices. However, the product of a row vector (a 1 x 2 
matrix) and a column vector (a 2 x 1 matrix) is defined -as long as the latter 
is to the right of the former—and the result is a 1 x 1 matrix, i.e., a number. 
This is because we have only one row to the left of a single column. What 
about the product of a row vector and a 2 x 2 matrix? As long as the matrix 
is to the right of the row vector, the product is defined and the result is a row 
vector. 

Example 6.1.5. With A and B as defined in Example 6.1.4 and 

x=(_\), y = (-i 2), 

we have 



=2 =BA 


yABx = (—1 2)(, 2 6 2 )(_\)-(4 14) (_',)=-10. 

In the manipulations above, we have used the associativity of matrix multiplication 
and multiplied matrices in different orders without, of course, commuting them. 
Products such as Ay, By, yy, and xx are not defined; therefore, we have not considered 
them here. ■ 

There is a new operation on matrices which does not exist for ordinary 
numbers. This is called transposition and is defined as follows: 


Box 6.1.4. The transpose of a matrix is a new matrix whose rows are 
the columns of the old matrix and whose columns are the rows of the old 
matrix. The transpose of A is denoted by A* or A. 


row vector 


transpose of a 
matrix 
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equation (6.14) 
will not work for 
arbitrary bases! 

inner (dot) 
product in terms 
of row and column 
vectors 


Therefore 


A = 


( fill 

Ul2 \ 

=► A 4 = A = P 11 

021 \ 

V«21 

022/ 

\Ol2 

022 / 


If A 4 = A, we say that A is symmetric. 

Example 6.1.6. With A, B, x, and y as defined in Example 6.1.5, we have 


A* = 





xt = (i -i), y = ( 2 1 ) • 


Note that although xx and yy are not defined, all the combinations xx, yy, yy, and 
xx are defined: In the first two cases one gets a number, and in the last two cases a 
2x2 matrix. ■ 


It should be clear from the definition of the transpose that 

(A + B) 4 = A 4 + B 4 , (AB) 4 = B 4 A 4 , (A 4 ) 4 = A. (6.13) 

Of the three relations, the middle one is the least obvious, but the reader 
can verify it directly by choosing appropriate general matrices and carrying 
through the multiplications on both sides of the relation. 


6.1.2 Inner Product 


From our discussion of Chapter 1, we know that if a and b are vectors in the 
plane having components (a x ,a y ) and (b x . b y ) along the x- and y- axes, then 
their dot product is 

a • b = a x b x + a y b y . (6.14) 

We want to generalize this dot product so that it applies to arbitrary bases. 
This generalization is called the inner product. 

Recall that any two non-collinear vectors {ai,a 2 } in the plane form a 
basis and any vector can be written as a linear combination of them with 
the unique coefficients being the components of the vector in the basis. In 
particular, the components of ai are (1, 0) and those of a 2 are (0,1). If we were 
to define the dot product in terms of components, we would have to modify 
Equation (6.14) because that equation would give zero for ai -a 2 which would 
be inconsistent with (1.1). How should we modify (6.14)? Since we want to 
deal with components, a natural setting would be the language of matrices. 
If a and b are the column vectors () and ( b b * ), respectively, then we can 
rewrite Equation (6.14) as 


a • b 


4 b= (c 


- (X'rb'r, 


Clyby. 


(6.15) 


It is this matrix relation that we want to generalize so that the result is the 
true dot product of vectors no matter what basis we choose in which to express 
our vectors. 
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Besides the failure of Equation (6.15) for general bases, the demand for 
generalization stems from another source: There are other kinds of “vectors” 
that are not just arrows in the plane. For instance, the polynomials CPi [t] of 
degree one that we introduced in Example 6.1.1 are such vectors. How do we 
define inner products for these vectors? We cannot use Equation (1.1) because 
neither the length of a polynomial nor the angle between two polynomials is 
defined. In fact, both the length and the angle are defined only after an 
inner product has been introduced. Furthermore, there is no guarantee that 
Equation (6.15) will make sense. 

Let’s see how far we can go using the general properties of the inner prod¬ 
uct discussed at the beginning of Section 1.1.1. Write a and b as a linear 
combination of the basis vectors {ai,a 2 }: 


a — oqai + a 2 a 2 , b — Pi&i + /? 2 a 2 

Take the dot-product of these vectors and write it in terms of the dot-products 
of the basis vectors: 


a b = (oqai + a 2 a 2 ) • (/?iai + /? 2 a 2 ) 
= aiPi&i ■ ai + cti/3 2 ai • a 2 
+ a 2 /3ia 2 • ai + a 2 /3 2 a 2 • a 2 

Define a matrix with elements 


<7n — a i • a i, 3 i 2 — ai • a 2 — a 2 • ai — < 721 , 


g 22 — a 2 • a 2 


Then, representing a and b as column vectors a= (al) and b= (|’), the 
dot product can be generalized to 


a • b = a 4 Gb = (oq 



(6.16) 


where G is a symmetric matrix. 

Example 6.1.7. In this example, we shall define an inner product for the vectors 
in CPi [£] that happens to be useful in physical applications. The idea is to find a rule 
that takes two “vectors” in CPi[t] and gives a real number. Since the vectors in CPi [£] 
are functions (albeit a very special kind), one natural way of getting numbers out 
of functions is by integrating them. It turns out that this is indeed the most useful 
way of defining the inner product for such polynomials. So, let (a, b) be an interval 
on the real line and let f = «o + ait and g = /3o + /3it be two “vectors” in CPi [£] . 
We define 

f'g=[ f(t)9(t)dt. (6.17) 

J a 

One can show that Equation (6.17) exhibits all the properties expected of an inner 
product (as outlined in Section 1.1.1). For instance, f ■ f is always positive because 
the integrand [/(t)] 2 is always positive. Furthermore, f - g = g - f, and, as the reader 
may check, 

f ■ (g + h) = f g + f ■ h. 


a symmetric 
matrix G is needed 
to generalize the 
inner product. 


inner (dot) 
product of two 
polynomials 
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These all indicate that we are on the right track. 

We also note that the inner product depends on the interval chosen on the real 
line. For different ( a,b ), we get a different inner product. The choice is usually 
dictated by the physical application. We shall choose a = 0,6 = 1, although this 
may not be a physically suitable choice. With such a choice and with {fi = 1, f 2 = t} 
as a basis, we obtain 


<7n = fi • fi = 
<?12 = fi ■ f2 = 
<?22 = f2 • f2 = 


f 

Jo 

f 

Jo 

f 

Jo 


fi{t)fi{t)dt = 
fi(t)f 2 (t)dt = 
f 2 {t)f 2 {t)dt = 


f 

Jo 

f 

Jo 

f 

Jo 


dt = 1, 

tdt = \ = g 2 i, 
t 2 dt = |. 


So the inner product matrix is 


G = 



the notion of 
length comes after 
that of the inner 
product! 


We started with Equation (1.1) as the definition of the inner product. This 
definition assumed a knowledge of lengths and angles. These are notions with 
which we become intuitively familiar very early in our mental development. 
However, such notions are not intuitively obvious for two polynomials. That 
is why the concepts of lengths and angles for objects such as polynomials 
come after introducing the notion of inner product. Of course, we want these 
notions to agree with the intuitive notions of lengths and angles, i.e., we want 
them to be related to the inner product in precisely the same manner as given 
in Equation (1.1). If we let b = a in that equation, we get a • a = |a | 2 . This 
becomes our definition for length: 


Box 6.1.5. Given any inner product on a set of objects that we can call 
“vectors, ” we define the length of a vector a as |a| = +y/a • a. 


Once the notion of length is established for a general set of vectors, we 
can define the angle between two vectors a and b as 


a • b a • b 

|a| |b| ,/a • a Vb ■ b 


(6.18) 


This equation and the one in Box 6.1.5 clearly show that lengths and angles 
are given entirely in terms of inner products. For these concepts to be valid, 
we must ensure that however we define the inner product, it will have the 
property that a • a > 0 for a nonzero vector. It turns out that most inner 
products encountered in applications have this property. Nevertheless, there 
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are cases (very important ones) for which a a < 0. In such cases, the concepts 
of length and angles, as we know them, break down, and we have to be content 
with “dot products” that may produce nonpositive numbers when a nonzero 
vector is “dotted” with itself. 

Even if a • a > 0, there is no a priori guarantee that the cosine obtained in 
Equation (6.18) will lie between —1 and +1, as it should. However, there is 
a famous inequality in mathematics called the Schwarz inequality, which 
establishes this fact for those inner products which satisfy a • a > 0. We shall 
come back to this later in this chapter. 

Example 6.1.8. The lengths of the basis vectors {fi = l,f 2 = t} of CPi[t] can be 
found easily using the results of Example 6.1.7: 

|fi| = y/YTT 1 = +Vi = i 
|f 2 | = y/fTb = +•/!• 


We can also find the “angle” between the two polynomials 

„ fl ' f 2 \ a/3 . 7t 

“ Ifillfcl ~ l-(lA/3) “ 2 =* d ~ 6‘ 


The matrix G, called the inner product matrix or metric matrix, 

completely determines the inner product of vectors when they are written 
as linear combinations of ai and a 2 . For example, consider a vector a with 
components (ai,a 2 ) in the basis {ai,a 2 }. Figure 6.3 shows a as the sum of 
OA (which is the same as aqai) and OA' (which is the same as a 2 a 2 ). Using 
the law of cosines for the triangle OAP , we get 

|a| 2 = OP 2 = OA 2 + AP 2 — 20A~AP cos ip 
= aq|ai| 2 + a 2 |a 2 | 2 + 2aia 2 |ai | |a 2 | cos 0i 2 . 



Figure 6.3: The length of a is the same whether we use the law of cosine or the inner 
product matrix G. 


in some important 
physical situations 
the "length” of a 
nonzero vector 
can be zero—even 
negative! 


G is the inner 
product matrix or 
the metric matrix. 
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On the other hand, using Equation (6.16), we obtain 

|a| 2 = a • a = (aq a 2 ) ( 9l1 

v ' \9‘1A 922) \Ot2j 

( \ (gil&l + ^12^2^ 2,0 I 2 

= (a 1 &2) , = ^ 11^1 + ^9l20LlOi2 + # 22^2 

V ' V^ 2iai +^ 22 ^ 2 / 

= a.i • a.iQ^ H- 2 q;iq;2^i * &2 H - &2 * ^- 2^2 
= |ai| 2 aq + 2aia 2 |ai| |a 2 | cos 6*12 + |a 2 | 2 a: 2 

and the two expressions agree. In fact, we can show this agreement very 
generally: 

a • b = (oqai + a 2 a 2 ) • (P 1&1 + /? 2 a 2 ) 

= ct\[3iELi ■ ai + ai/3 2 ai • a 2 + a 2 /3 ia 2 • ai + ct 2 f3 2 Ei 2 • a 2 
= ai/?i5ii + ai/? 2 ffi 2 + a 2 f3i92i + a 2 (3 2 g 22 



where we used the distributive property of the inner product. 

It should now be clear to the reader that the matrix G contains all the 
information needed to evaluate the inner product of any pair of vectors. Sup¬ 
pose now that instead of {ai,a 2 } we choose {ei,e 2 } where ei and e 2 are 
unit vectors and perpendicular to one another. Then, the matrix G will have 
elements 


3n = ei • ei = 1, gi 2 = g 2 i = ei • e 2 — 0, g 22 = e 2 • e 2 = 1, 

i.e., G is the unit matrix. In that case, we obtain 

aGb = ( a\ a 2 ) ^ (jh) = + a2 ^ 2 

which is the usual expression of the dot product of two vectors in terms of 
their components. A basis whose vectors have unit length and are mutually 
orthonormal basis perpendicular to one another is called an orthonormal basis. Thus, 


Box 6.1.6. Only in an orthonormal basis is the dot (inner) product of two 
vectors equal to the sum of the products of their corresponding components. 
In such a basis the inner product matrix G is the unit matrix. 


The matrix G was introduced to ensure the validity of the inner product 
in an arbitrary basis. This poses some restriction on G; for example, we 
saw that it had to be symmetric, i.e., 312 = g 2 1 because of the symmetry of 
the dot product. Another restriction—if we want thedot product of a basis 
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vector with itself to be positive—is that gn > 0 and g 22 >0, in which case 
the inner product is called positive definite (or Riemannian). It turns 
out, however, that such a restriction constrains G too much to be useful in 
physical applications. Although, in most of this book, we shall adhere to 
the usual positive definite or Euclidean inner product, the reader should be 
aware that non-Euclidean inner products also have important applications in 
physics. 


Box 6.1.7. Regardless of the nature of G, we call two vectors a and b 

G -orthogonal if a • b = aGb = 0. 


Every point in the plane can be thought of as the tip of a vector whose 
tail is the origin. With this interpretation, we can express the (G-clependent) 
distance between two points in terms of vectors. Let ri be the vector to point 
Pi and r 2 the vector to point P 2 . Then the “length” of the displacement 
vector Ar = ri — r 2 is the “distance” between Pi and P 2 : 

PiP 2 2 = Ar • Ar = (ri - r 2 ) • (r x - r 2 ) = (Ar)G(Ar). (6.19) 

_2 

Keep in mind that only in the positive definite (Euclidean) case is PlP 2 
nonnegative. There are physical situations in which the square of the length 
of the displacement vectors can be zero or even negative. We shall encounter 
one such example when we discuss the special theory of relativity. 

The simplicity of G in orthonormal bases makes them very much in de¬ 
mand. So, it is important to know whether it is always possible to construct 
orthonormal vectors out of general basis vectors. The construction should 
involve linear combinations only. In other words, given a basis {ai,a 2 }, we 
want to know if there are linear combinations of ai and a 2 which are orthonor¬ 
mal. We assume that the inner product is positive definite, so that the inner 
product of every nonzero vector with itself is positive. First we divide ai by 
its length to get 

„ = _&i_ _ ai 

|ai| i/aj" • ai' 

To obtain the second orthonormal vector, we refer to Figure 6.4 which shows 
that if we take away from a 2 its projection on ai, the remaining vector will 
be orthogonal to ai. So consider 

a 2 = a 2 — ( a 2 • ei)ei 

projection of 
a2 on ai 

ei • a ' 2 = ei • a 2 - (a 2 • ei) e x • ei = 0 . 

=1 


positive definite, 
or Riemannian 
inner product 


Gram-Schmidt 
process for the 
plane 


and note that 
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Figure 6.4: The illustration of the Gram-Schmidt process for two linearly independent 
vectors in the plane. 


This suggests defining e 2 as 




The reader should note that in the construction of {ei,e 2 }, we have added 
vectors and multiplied them by numbers, i.e., we have taken a linear combi¬ 
nation of ai and a 2 . This process, and its generalization to arbitrary number 
of vectors, is called the Gram—Schmidt process, and shows that by appro¬ 
priately taking linear combinations, it is always possible to find orthonormal 
vectors out of any linearly independent set of vectors. 

Example 6.1.9. The basis {1, £} introduced for CPi [£] is not orthonormal when 
the inner product is integration over the interval (0, 1) as in Example 6.1.7. Let us 
use the Gram-Schmidt process to find an orthonormal basis. We note that the first 
basis vector already has a unit length; so we let ei = fi = 1. To find the second 
vector, we first construct 

f 2 = f 2 - (f 2 • ei)ei = t - (§)1 = t - | 


with 



Then the second vector will be 



Vl2(t- i) = V 3 ( 2 t- 1). 


The reader may verify directly that {ei,e 2 } is an orthonormal basis. 


Example 6.1.10. Consider the vectors 


3.1 — e x + e y 


<3-2 — 2 e x Gy. 


and 
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The inner product matrix elements in the basis {ai,a 2 } are 

gu = ai ■ ai = (e x + e v ) ■ (e x + e y ) = 2, g 12 = (e x + e y ) ■ (2e x + e y ) = 3, 

521 = a 2 • ai = 512 = 3, 522 = (2e x + e y ) ■ (2e x + e y ) = 5. 

or, in matrix form, G = ( 3 5 ). 

Now consider vectors b and c, whose components in {ai,a 2 } are, respectively, 
(1,1) and (—3, 2). We can compute the scalar product of b and c in terms of these 
components using Equation (6.16): 

b.c,bGc,(i i)g *)(- 2 3 )-(‘ 0 (;)=*■ 

We can also write b and c in terms of e x and e y and use the usual definition of 
the inner product (in terms of components) to find be. Since b has the components 
( 1 , 1 ) in {ai,a 2 }, it can be written as 

b — ai -T a 2 — (e^ e y ) + (2e x -f- e y ) = 3e x -f- 2e y . 

Similarly, 

c — 3ai T 2a 2 — 3(e^, ey) d - 2(2g x d - Gy) — Q x ey. 

Thus, in {e x ,ey}, b has components (3, 2), and c has components (1, —1). Then 
b • c = b x c x + b y c y = 3 • 1 + 2 • (-1) = 1 

which agrees with the previous result obtained above. g 


Example 6.1.11. Consider two vectors f and g in 7i\t] with 

f s f(t) = a 0 + ait, g = g(t) = (3 0 + Pit. 

We want to find the inner product of these two vectors. First, we use the basis {1, t} 
and its corresponding G matrix found in Example 6.1.7: 

f■ g = f*Gg = (do a i)^i 1 ^ (^/3i) = a °0° + l( a °/3i + aiflo) + 

Next, we use the orthonormal basis found in Example 6.1.9. In this basis G is the 
identity matrix and the inner product is the usual one in terms of components. 
However, the components of f and g need to be found in {ei,e 2 }. The reader may 
check that 


„ „ / 1 , 1 . 
f = Qo + ait = aoei -T an —=e 2 + —ei 
\2\/3 2 

g = po + Pit = (0o + |/3i)ei + ^0=e 2 . 


= (ao + |ai)ei + ^e 2 , 


It then follows that 


f g 


(on + ^ ai)(Po + |/3i) + 




aofio + \(aoPi + ai/ 3 o) + \a1P1. 
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Finally, we take the dot product of the two vectors using the definition of this dot 
product: 


f-g = / (ao + ait)(/3o + /3it)dt 
Jo 

= aoPo [ dt + (aoPi + aiPo) [ tdt + aiPi [ t 2 dt 
Jo Jo Jo 

= ao(3o + ^(«o/3i + «i/3o) + §ai/3i. 

All three ways of calculating the inner product agree, as they should. | 


6.1.3 Orthogonal Transformation 

Now that we have defined inner products, we may combine it with the concept 
of transformation. More specifically, we seek transformations that leave the 
inner product—which we shall assume to be positive definite (Euclidean)— 
unchanged. Under such transformations, the length of a vector and the angle 
between two vectors will not change. That is why such transformations are 
called rigid transformations. We choose an orthonormal basis, so that 
G = 1, and denote the transformed vectors by a prime: a' = Aa, b' = Ab. 
Then the invariance of the inner product yields 

a'b' = ab => (Aa)Ab = aAAb = ab. 

This will hold for arbitrary a and b only if 

AA = 1. (6.20) 

Matrices that satisfy this relation are called orthogonal. We now investigate 
conditions under which Equation (6.20) holds by writing out the matrices: 

fa 11 «2l\ fc In ai2A _ f a ll + a 21 011012 + 021022^ _ O 0\ 

\Oi2 022/ \021 022 / \0l20n + 022021 a\ 2 + «22 / \0 1/ 

which is equivalent to the following three equations: 

n n O O / \ 

Oil + a 21 = 1) OllOl2 + 021022 = 0, a 12 + 0 2 2 = 1. (6.21) 

Squaring the second equation and substituting from the first and third, we 
get 


(Z'ilCt'12 — O 91 CL- 


21^22 


(1 a 2l) a 12 ~ a 2l(^ ^ 12 ) 


^21 


The first and third equations of (6.21) now yield 


2 2 

a 22 ~ a ll 


and 


2 i 2 

= a 91 = 1 — a-i 


= a 


2 

12 - 
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Therefore, all parameters are given in terms of an. Now the first equation 
of (6.21) indicates that —1 < an < 1. It follows that an can be thought 
of as a sine or a cosine of some angle, say 9. Let us choose cosine. Then 
022 = ± cos 9. If we choose the plus sign for cosine, then the middle equation 
of (6.21) shows that 012 = —021 = ± sin 9, and if we choose the minus sign, 
ai 2 = 021 = ± sin 9. Let us choose the plus sign for cosine. Then, we obtain 
two possibilities for A: 

A _/cos0 — sin 9\ _ / cos 9 sin0\ 

\^sin 0 cos 9 J ° r sin 0 cos 9J 

The difference is in the sign of the angle 9. 

Writing (x, y) for the components of a vector in the plane [instead of 
(cti, 0:2)], and ( x', y') for the transformed vector, and using the first choice for 
A, we have 

fx'\ _ ( cos 9 — sin 0\ ( a:\ 

\y') ysin 9 cos 9 J \yJ 
or 


x' = xcos9 — ysin9, ( 6 . 22 ) 

y'= xsin9 + ycos9. (6.23) 

This is how the coordinates of a point in the plane transform under a counter¬ 
clockwise rotation of angle 9. Had we chosen the second form of A, we would 
have obtained a clockwise rotation of the coordinates. Notice how we chose 
the signs of sines and cosines to ensure that when 0 = 0 , the rotation is the 
unit matrix, i.e., no rotation at all. Although rotations are part of orthogonal 
transformations, the converse is not true: There are orthogonal transforma¬ 
tions that do not correspond to a rotation. For example, the matrix 

A Sind') (6.24) 

(sin 9 — cos 9 J v 

is orthogonal (as the reader can verify), but it does not correspond to a rota¬ 
tion because at 9 = 0 it does not give the identity matrix. 

In general, the inner product of the transformed (primed) vectors will be 

a'Gb' = (Aa)GAb = aAGAb. 

For A to preserve the inner product, i.e., for a'Gb' to be equal to aGb, we need 
to have 

AGA = G. (6.25) 

A matrix that satisfies Equation (6.25) is called G-orthogonal. 

Matrices entered mathematics slowly and somewhat reluctantly. The related notion 
of determinant, which is a number associated with an array of numbers, was intro¬ 
duced as early as the middle of the eighteenth century in the study of a system of 


2 x2 orthogonal 
matrices are 
described in terms 
of a single 
parameter. 


G-orthogonal 

matrices 
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1821-1895 


linear equations. However, the recognition that the array itself could be treated as 
a mathematical object, obeying certain rules of manipulation, came much later. 

Logically, the idea of a matrix precedes that of a determinant as Arthur Cayley 
has pointed out; however, the order was reversed historically. In fact, many of the 
properties of matrices were known as a result of their connection to determinants. 
Because the uses of matrices were well established, it occurred to Cayley to introduce 
them as distinct entities. He says, “I certainly did not get the notion of a matrix in 
any way through quaternions; it was either directly from that of a determinant or 
as a convenient way of expression of” a system of two equations in two unknowns. 
Because Cayley was the first to single out the matrix itself and was the first to 
publish a series of articles on them, he is generally credited with being the creator 
of the theory of matrices. 

Arthur Cayley’s father, Henry Cayley, although from a family who had lived 
for many generations in Yorkshire, England, lived in St. Petersburg, Russia. It was 
in St. Petersburg that Arthur spent the first eight years of his childhood before his 
parents returned to England and settled near London. Arthur showed great skill 
in numerical calculations at school and, after he moved to King’s College in 1835, 
his aptitude for advanced mathematics became apparent. His mathematics teacher 
advised that Arthur be encouraged to pursue his studies in this area rather than 
follow his father’s wishes to enter the family business as a merchant. 

In 1838 Arthur began his studies at Trinity College, Cambridge, from where he 
graduated in 1842. While still an undergraduate he had three papers published in 
the newly founded Cambridge Mathematical Journal. For four years he taught at 
Cambridge having won a Fellowship and, during this period, he published 28 papers. 

A Cambridge Fellowship had a limited tenure so Cayley had to End a profession. 
He chose law and was admitted to the bar in 1849. He spent 14 years as a lawyer but 
Cayley, although very skilled in conveyancing (his legal speciality), always considered 
it as a means to make money so that he could pursue mathematics. During this 
period he met Sylvester who was also in the legal profession. Both worked at the 
courts of Lincoln’s Inn in London and discussed deep mathematical questions during 
their working day. During these 14 years as a lawyer Cayley published about 250 
mathematical papers! 

In 1863 Cayley was appointed to the newly created Sadleirian professorship of 
mathematics at Cambridge. Except for the year 1882, spent at the Johns Hopkins 
University at the invitation of Sylvester, he remained at Cambridge until his death 
in 1895. 


6.2 Vectors in Space 

The ideas developed so far can be easily generalized to vectors in space. For 
example, a linear combination of vectors in space is again a vector in space. 
We can also find a basis for space. In fact, any three non-coplanar (not lying 
in the same plane) vectors constitute a basis. To see this, let {ai,a 2 ,a 3 } be 
three such vectors drawn from a common point 5 and assume that b is any 
fourth vector in space. If b is along any of the a’s, we are done, because then 
b is a multiple of that vector, i.e., a linear combination of the three vectors 

5 If the vectors are not originally drawn from the same point, we can transport them 
parallel to themselves to a common point. 
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Figure 6.5: Any vector in space can be written as a linear combination of three non- 
coplanar vectors. 


(with two coefficients being zero). So assume that b is not along any of the 
a’s. The plane formed by b and a 3 intersects the plane of ai and a .2 along 
a certain line common to both (see Figure 6.5). Draw a line from the tip of 
b parallel to as. This line will resolve b into a vector OB in the plane of 
ai and &2 and a vector BP parallel to a 3 . So, we write b = OB + 0 : 333 . 
Furthermore, since OB is in the plane of ai and a 2 , it can be written as a 
linear combination of these two vectors: OB = oiai + 02 a 2 . Putting all of 
this together, we get 


b = oiai + 02 a 2 + 0383 . 


This shows that 


Box 6.2.1. The maximum number of linearly independent vectors in space 
is three. Any three non-coplanar vectors form a basis for the space. 


It follows that the space is a three-dimensional vector space. 

In the previous section we introduced 3 > i[t], the set of polynomials of first 
degree, and showed that they could be treated as vectors. We even defined 
an inner product for these vectors, and from that, we calculated the length of 
a vector and the angle between two vectors. This process can be generalized 
to three dimensions. Let 3 * 2 [t] be the set of polynomials of degree 2 (or less) 
in the variable t. One can easily show that such a set, a typical element of 
which looks like ao + a\t + 02 i 2 , has all the properties of arrows in space. We 
shall use 3 * 2 [i] as a prototype of vectors that are not directed line segments. 
Clearly, {l,t,f 2 } form a basis for 3 * 2 [f]; therefore, CP 2 [t] is a three-dimensional 
vector space. 


polynomials of 
degree 2 or less 
form a 

3 -dimensional 
vector space. 
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transformation of 
vectors in space 
lead to 3 x 3 
matrices. 


6.2.1 Transformation of Vectors 

In the case of the plane, the machinery of matrices connected the components 
of a vector in different bases. In the same context, we contrasted active 
versus passive transformation. From now on, we want to concentrate on active 
transformations, i.e., we consider transformations that alter the vectors rather 
that the axes. 

Consider a vector a with components (aq, « 2 , 0 : 3 ) in the basis B = {ai,a 2 , 
a 3 }. If we transform this vector, it will acquire new components, (o^, a 2 , a' 3 ), 
in the same basis B. We can therefore write 

a = aqai + a 2 a 2 + CX 3&3 and a' = aqai + c 4 a 2 + & 3 & 3 , (6.26) 

where a' is the transform of a. Now suppose that we transform both a and 
the basis vectors in exactly the same manner. Then the components of the 
transformed a will be the same in the new basis as the original a was in the 
old basis: 

a — a±a 1 T cx 2 a 2 T ciga^. (6.2T) 

Since B is a basis, any vector, in particular, the transformed basis vectors 
can be written in terms of them: 

a l = Oiiai + Cl21 a 2 + «31 a 3i 

a 2 = Ol2 a l + Ct22 a 2 + «32 a 3i (6.28) 

a 3 = Ol3 a l + Ct23 a 2 + «33 a 3- 

Now substitute Equation (6.28) in the RHS of (6.27), and the second equation 
of (6.26) in the LHS of (6.27) and rearrange terms to obtain 

(aq — anaq — ai2<a2 — ^130:3)31 + ( ol 2 — a 2 \ a .\ — 0220:2 — 02303)32 
+ (03 — 03101 — 03202 — 03303)33 = 0. 

The linear independence of ai, a 2 , and a 3 gives 

a[ = onoi + a 12O2 + 01303, 

02 = a 2 \ d \ + 02202 + 023O3, ( 6 . 29 ) 

03 = 03101 + 03202 + 03303, 

which, with the introduction of 3 x 1 (column), and 3x3 matrices, can be 
written concisely as 

( O'A / On Oi2 Oi3\ / OA 

a 2 I = 021 a 2 2 a 2 3 I 02 I or a' = Aa. (6.30) 

O3 ) \a 3 i 032 033/ \a 3 / 

To know how a general vector transforms, we only need the transformation 
matrix, namely the 3x3 matrix in Equation (6.30). This, in turn, is ob¬ 
tained completely from the transformation of basis vectors as given in Equa¬ 
tion (6.28). The reader should note, however, that the coefficients in each line 
of (6.28) appear as a column in the transformation matrix. Thus, 
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Box 6.2.2. To find the transformation matrix , apply the transforma¬ 
tion to the basis vectors, and write the transformed basis vectors in terms 
of the old basis vectors. The “horizontal” coefficients become the columns 
of the transformation matrix. 


Let us apply a transformation to a' and to (a), a' 2 . a^}. We could denote 
the new vectors by a second prime; but, then it would give the impression 
that it is the same transformation as the earlier one. This is not the case. 
Therefore, we use a new symbol to emphasize that the second transfor¬ 
mations is of a completely different nature, and denote the new transformed 
vectors by a / and (a) , a' 2 ,a 3 }. In the basis {ai,a2,a3}, a' can be written as 

a' = a"ai + «2 a 2 + a" a.3, ( 6 . 31 ) 

while the application of the new transformation to the second equation of 
(6.26) gives 

cl = Q^3l H - CX 2^-2 “1“ 0 ^ 333 . 

The vectors on the RHS can be written as a linear combination of {ai, a 2 , 33 }: 


a i — a^ai + a' 2 i3.2 + a 3i a 3j 

&2 = Gq2 a l T ^22 a 2 T ^32 a 3> (6.32) 

a 3 = a l3 a l + a 23 a 2 + a 33 a 3- 


Using the by-now-familiar procedure, we can relate the coefficients as follows: 

or a" = A'a'. (6.33) 

We can also find how a" and a are related in two ways. The first way 
applies to both sides of Equations (6.27) and (6.28), substitutes (6.32) 
in the transformed (6.28), and the result of this substitution in (6.27). This 
will give a as a linear combination of ai, a 2 , and a 3 . Equating this with 
Equation (6.31) will give us a matrix relation between the a" and a. Second, 
we can substitute the matrix relation of Equation (6.30) in that of (6.33) 
and obtain a relation between the a" and a via the product of two matrices. 
Comparison of these two relations will give us the rules of multiplication for 
3x3 matrices which, except for the number of elements involved, is identical 
to the multiplication rule for the 2x2 matrices. Similarly, the multiplication 
by a row or a column vector, etc., is exactly as before. 

There is a new kind of matrix associated with the space that we could not 
consider in our discussion of the plane. Let B = {ai,a 2 ,a 3 } be a basis for 
the space, and take any two of the vectors in B, say ai and a 2 . These two 
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vectors form a plane any vector of which has only two components: If a is in 
this plane, it can be written as 


a — aiai + 0 . 2 ^ 2 - 


Now suppose we apply the same transformation to both a and {ai, a 2 }. Then, 
on the one hand, a' = aia^ + a^aj), and on the other hand, a' = a , 1 ai + a 4 a 2 + 
Ogaa, because the transformed a, in general, comes out of the plane of ai and 
a 2 . Therefore, 

o^a^ T 0 : 2 a 2 = o^a^ T o: 2 a2 T o^ag. (6.34) 

But we also have 


a i — an a i + a 2 ia 2 + a3ia3, 
a 2 = ai 2 ai + a22 a 2 + ci32 a 3- 

Substituting these in Equation (6.34) yields 

( ot’i — anoi — ai 2 <a 2 )ai + (a 2 — 0:210:1 ~ O 220 2 )a 2 + (03 — 0:310:1 — 0,320:2)^3 = 0 . 


Linear independence of the vectors in B now gives 

Ox = aixOi + a. 1202 , 

a 2 = a 2 iOi + < 22202 , (6.35) 

O 3 = 03101 + 03202 , 

which can be written in matrix form as 

( Oil Oi2\ / \ 

021 022 ! ( M or a'= Aa. (6.36) 

031 032 / ^ a2y ' 

The matrix A is now a 3 x 2 matrix. It relates two-component column vectors 
to three-component column vectors. 

Example 6.2.1. Another way to illustrate the preceding discussion is to use first 
degree polynomials. Let us multiply all polynomials of CPi [£] by a fixed first degree 
polynomial, say 1 + t. This will transform vectors of Ti[t] into vectors of CP 2 [£] - 

In particular, it will transform the basis {1, t} into vectors in CP 2 [t] which can be 

expressed as a linear combination of the basis vectors { 1, t, t 2 } of CP 2 [£] . Let fi = 1, 
f *2 = t, and f 3 = t 2 , and note that 

f('== 1 • (1 + t) == 1 + t = 1 • fx + 1 • f 2 + 0 • f 3 , 

1*2 — 1(1 +1) — t 1 = 0 ■ fi T 1 • f 2 T 1 ■ f 3 . 

According to Box 6.2.2, the transformation matrix is 
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from which we can find the transform of a general vector f = ao + exit in CP 1 [£]. If 
the transformed vector is written as f' = a'o + a[t + cx^t 2 , then 





This can be verified directly by multiplying f = ao + exit by 1 +t. ■ 


In the discussion above, we started with the plane (with two dimensions) 
and transformed to space (with three dimensions). Example 6.2.1 illustrated 
this transformation for CPi [t] and J^[i]. We can also start with three dimen¬ 
sions and end up in two dimensions. The result will be a matrix relation of 
the form 



with B a 2 x 3 matrix. The following example illustrates this point. 

Example 6.2.2. Let us start with IP 2 [t] and as transformation, consider differen¬ 
tiation which acts on the basis {l,t,f 2 }. It is clear that the resulting vectors will 
belong to fPi[f], because they will be linear combinations of 1 and t. With fi = 1, 
f 2 = t, and f 3 ~ t 2 , and using a prime to denote the transformed vector, we can 
write 

fi'= ^(l) = 0 = 0-fi + 0-f 2 , 
f2 = ^( t ) = 1 = 1 • fi + 0 • f 2 , 
f 3 = -^(t 2 ) = 2t = 0 -fi+ 2 -f 2 , 
giving rise to the transformation matrix 


The reader may verify that the coefficients (ao, a() in CPi[£] of the derivative of an 
arbitrary polynomial f(t) = a 0 + ait + cxit 2 are given by 



1 

0 



which can also be obtained directly by differentiating /(t). 


The point of this discussion is that if you have a collection of vectors with 
various numbers of components, then it is possible to construct matrices that 
relate the two sets of vectors. These matrices have different numbers of rows 
and columns. The mathematics of these new matrices, their notion of equality, 
their addition, subtraction, multiplication, transposition, etc., is exactly the 
same as before 


differentiation is a 
(linear) 

transformation. 
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Example 6.2.3. Suppose 


A= -1 2 

\0 1 ) 

Then A + B is not defined, but 


B = 


1 -1 0 

-1 2 1 


and B = 


-1 0 1 
12-2 


-1 0 1 
1 2 - 2 /' 


0 -1 1 
0 4-1 


and 


/l -1\ -1 l\ 

A+ B* = [ -1 2 | + I 0 2 

\ 1 - 2 / 


\0 1 / 

As for multiplication, we have 

(1 -P 
AB = -1 2 

Vo i , 


/ 0 0\ 

= 1-1 4 I = (A* + B)*. 


-1 0 1 
12-2 


/—2 -2 3 N 

= 3 4-5 

V 1 2-2; 


and 




-1 

-1 


where the element in the ith row and j th column of the product is obtained by 
multiplying the «th row of the left factor by the j th row of the right factor term-by- 
term and adding the products (see Box 6.1.3). ■ 


The 3x3 matrix 


1 = 



is the 3x3 identity matrix (or unit matrix), and has the property that when it 
multiplies any other 3x3 matrix on either side, the latter does not get affected. 
Similarly, when this identity matrix multiplies a three-column vector on the 
left or a three-row vector on the right, it does not affect them. As in the case 
of the plane, the unit matrix is used to define the inverse of a matrix A as a 
matrix B that multiplies A on either side and gives the unit matrix. 


6.2.2 Inner Product 

As in the case of two dimensions, the usual rule of the dot product of space 
vectors in terms of their components along e x , e y , and e z does not apply in 
the general case. For that, we need an inner product matrix G. As in the 
plane, this is a matrix whose elements are dot products of the basis vectors. 
If B — {ai,a 2 ,a 3 } is a basis for space, then G is a 3 x 3 symmetric matrix 


9u 

9ii 

913 \ 



G = 1 <?2i 

922 

923 , 

9ij — 9ji — 5 ^"> j — 1) 2, 3. 

(6.38) 

\931 

332 

333/ 
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Example 6.2.4. Let us find the inner product matrix for the basis {l,f,t 2 } of 
CP 2 [t] when the inner product integration is from 0 to 1. Because of the symmetry of 
the matrix and the fact that we have already calculated the 2 x 2 submatrix of G, we 
need to find g 13 , g 2 3 , and g 33 . Let fi = /i(t) = 1, f 2 = / 2 (t) = t, and f 3 = f 3 (t) = t 2 ; 
then 


513 

II 

2 

2 

II 

[ hit)hit) dt = 

[ t 2 dt 



Jo 

Jo 



f 1 

r 1 0 

523 

II 

2 

2 

II 

/ hit) f sit) dt = 

/ t 3 dt 



Jo 

Jo 



r 1 

r 1 A 

533 

II 

2 

2 

II 

/ hit)hit) dt = 
Jo 

/ t 4 dt 
Jo 


It follows that 



This matrix can be used to find the dot product of any two vectors in terms of their 
components in the basis {l,f,f 2 } of V 2 \t\. _ 


If a and b have components ( 01 , 02 , 03 ) and (/?i,/ 3 2) /%) in B , then their 
inner product is given by 


/5n 512 5i3\ / P 1 \ 

aGb = (01 o 2 03) 521 522 523 \ P 2 • ( 6 . 39 ) 

\531 532 533/ \/?3/ 

If this expression is zero, we say that a and b are G-orthogonal. For an or¬ 
thonormal basis, the inner product matrix G becomes the unit matrix 6 and 
we recover the usual inner product of space vectors in terms of components. 

As discussed in the case of the plane, every point in space can be thought 
of as the tip of a vector whose tail is the origin. Then, we can express the 
(G-dependent) distance between two points in terms of vectors. Let ri be 
the vector to point Pi and r 2 the vector to point P 2 . Then the length of the 
displacement vector is the “distance” between Pi and P 2 : 

Ar • Ar = (it - r 2 ) • (it - r 2 ) = (Ar)G(Ar). ( 6 . 40 ) 

_2 

Recall that only in the positive definite case is P 1 P 2 nonnegative. 

As in the case of the plane, it is convenient to construct orthonormal basis 
vectors in space. This can be done by the Gram-Schmidt process. Suppose 
B = {ai,a 2 ,a 3 } is a basis for space as shown in Figure 6.6. Again, to avoid 
complications, we assume that the inner product is positive definite, so that 
the inner product of every nonzero vector with itself is positive. We know 
how to construct two orthonormal vectors out of {ai,a 2 }; we did that in 

6 Only if the inner product is positive definite. 


G-orthogonal 
vectors in space 


Gram-Schmidt 
process for vectors 
in space 
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G-orthogonal 

matrices 



Figure 6 .6: The Gram-Schmidt process for three linearly independent vectors in space. 


our discussion of the plane. Call these new orthonormal vectors {ei,e 2 } and 
construct the vector a 3 , 

a 3 = a 3 - (a 3 • ei)ei - (a 3 • e 2 )e 2 

which is obtained from a 3 by taking away its projections along ei and e 2 . 
Now note that 


ei • a' 3 = ei • a 3 — (a 3 • ei)ei^ei -(a 3 • e 2 )e2^ei = 0, 

=1 =o 

^2 • ^3 = e 2 • a3 — (a 3 • ei) e^ei — (a 3 • e 2 ) = 0, 

=o =1 

i.e., a 3 is orthogonal to both e 3 and e 2 . This suggests defining e 3 as 


e 3 = 


_ a 3 




The reader should note that in the construction of {ei, e 2 , e 3 }, we have simply 
taken the linear combination of ai, a 2 , and a 3 . 

Transformations that leave the inner products unchanged can be obtained 
in exactly the same way as for the plane. For A to preserve the inner product, 
we need to have 

AGA = G, (6.41) 

i.e., it has to be G-orthogonal. If G is the identity matrix, then A can be 
thought of as a rigid rotation and is simply called orthogonal ; it satisfies 



AA = 

1 . 

(6.42) 

an 

Ol 2 

Ol3\ 

1 

021 

022 

«23 


031 

032 

O33/ 



If we write A as 
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then Equation (6.42) can be written as 


an 

021 

0.31 \ 

/ On 

012 

Ol3 

012 

«22 

032 

021 

022 

023 

ai3 

«23 

033/ 

\a 3 i 

O32 

033 


\ A 

0 

°\ 

O 

II 

1 

0 

/ Vo 

0 

1 / 


(6.43) 


It is clear from Equation (6.43) that the columns of the matrix A, considered 
as vectors, have unit length and are orthogonal to other columns in the usual 
positive definite inner product.' This is why A is called orthogonal. 

The product on the LHS of Equation (6.43) is a 3 x 3 matrix whose elements 
must equal the corresponding elements of the unit matrix on the RHS. For 
example, 

«ii + 0-21 + a 3i = 1- (6.44) 

Similarly, the equality of the elements located in the first row and second 
column on both sides gives 


011012 + 021022 + 031032 = 0 


and so on. Thus we obtain nine equations. However, simple inspection of these 
equations reveals that only six of them are independent. Therefore, we can 
only solve for the nine unknowns in terms of three of them (see Section 7.6). 
It does not matter which three matrix elements we choose. If we choose on, 
021 , and 031 , for example, then Equation (6.44) reveals that these parameters 
can be sines and cosines. What this means physically is that Three parameters 
are required to specify a rigid rotation of the axes. 

There are many ways to specify these three parameters. One of the 
most useful and convenient ways is by using Euler angles ijj, ip, and 9 (see 
Figure 6.7). Example 6.2.5 below shows that in terms of these angles, the 
matrix A can be written as 


orthogonal 
matrices in space 
are determined by 
three parameters 
such as the Euler 
angles. 


Euler angles 


( cos i p cos <p —sin 0 cos 6 sin <p 
sin i j) cos ^7+cos 0 cos 6 sin ip 
sin 0 sin ip 


— cos 0 sin tp —sin 0 cos 0 cos ip 

— sin 0 sin <£>+cos 0 cos 0 cos (p 

sin 0 cos p 



It is straightforward to verify that A* A = 1. Euler angles are useful in de¬ 
scribing the rotational motion of a rigid body in mechanics. 


Example 6.2.5. From Figure 6.7 it should be clear that the primed basis is ob¬ 
tained from the basis { 61 , 62 , 63 } by the following three operations. 

(a) Rotate the coordinate system about the 63 -axis through angle ip. This corre¬ 
sponds to a rotation in the 6162 -plane, leaving the 63 -axis unchanged. We saw in 
the previous section how the 2x2 part of the matrix looked like. The complete 
3x3 matrix corresponding to such a rotation is 


( cosy? 
sin ip 
0 


— Sill ip 

cos <p 

0 


a general 

orthogonal matrix 
in space can be 
written as the 
product of three 
successive 
rotations. 


7 This holds for 2 X 2 orthogonal matrices as well. 


0' 

0 

1 


(6.45) 
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Figure 6.7: The Euler angles and the rotations about three axes making up a general 
rotation in space. 


It is clear that this matrix leaves the third ( z) component of a column vector un¬ 
changed while rotating the first two (a; and y) components by ip. 

(b) Rotate the new coordinate system around the new ei-axis (the £-axis in the 
figure) through an angle 8. The corresponding matrix is 

/! ° 0 \ 

A 2 = I 0 cos 9 — sin# 1 . (6.46) 

\0 sin# cos 8 ) 

(c) Rotate the system about the new e 3 -axis (the e^-axis in the hgure) through an 
angle ip. The corresponding matrix is 

( cos ip — sin ip 0 \ 

sin ip cos ip 0 ] . (6.47) 

0 0 l) 

It is easily verihed that A = A 3 A 2 A 1 , i.e., the rotation A has the same effect as that 
of Ai, A 2 , and A 3 performed in succession. ■ 


6.3 Determinant 


from matrices to 
systems of linear 
equations to 
determinants 


Matrices have found application in many diverse fields of pure and applied 
mathematics. One such application is in the solution of linear equations. Con¬ 
sider the first set of equations in which we introduced matrices, 
Equations (6.4) and (6.5). The first of these equations associates a pair of 
numbers (a), a 2 ) to a given pair ( 01 , 02 ), he., if we know the latter pair, 
Equation (6.4) gives the former. What if we treat ( 01 , 02 ) as unknown? Un¬ 
der what conditions can we find these unknowns in terms of the known pair 
( 0 ^, 02 )? Let us use a more suggestive notation and write Equation (6.4) as 


dux + a V2 y = h, 
a 2 ix + a 22 y = b 2 - 


(6.48) 
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We want to investigate conditions under which a pair (x, y) exists which sat¬ 
isfies Equation (6.48). Let us assume that none of the o^-’s is zero. The case 
in which one of them is zero is included in the final conclusion we are about 
to draw. Multiply the first equation of (6.48) by 022 and the second by 012 
and subtract the resulting two equations. This yields (011022 — 012021)2 = 
02261 — 012 ^ 2 , which has a solution for x of the form 


022^1 — Oi2&2 _ 022^1 — Oi2&2 

011 O 22 — ■ 012 O 21 det A 


(6.49) 


if On 022 ^ 012021 0. In the last equality we have defined the determinant 

of A: 

A = [ 11 ai2 ) => det A = oiia 2 2 — 012021- ( 6 . 50 ) 

\021 022 / 

We can also find y. Multiply the first equation of (6.48) by 021 and the second 
by an and subtract the resulting two equations. This yields 


(011022 - 012021)3/ = On6 2 - a 2 ifoi 
which has a solution for y of the form 

01162 — 02161 , . 

V = det A ' (6 ' 51) 

We can combine Equations (6.49) and (6.51) into a single matrix equation: 

1 


022 — 012 

det A V —021 an 


(6.52) 


This is the inverse of the matrix form of Equation (6.48). Indeed if we had 
written that equation in the form Ax = b, and if A had an inverse, say B, 
then we could have multiplied both sides of the equation by B and obtained 


J3A^x= Bb 
=1 


x = Bb. 


This is precisely what we have in Equation (6.52)! Is the matrix multiplying 
the column vector b the inverse of A? Let us find out 


1 / 

f 022 

-Ol 2 \ 

( Oil 

Ol 2 \ 

det A ' 

,—021 

Oil / 

\0 2 1 

022 / 


_ 1 /022011 — 012021 0 \ _ A 0 

det A \ 0 —021012 + 011022 / \0 1 

So, it is indeed the inverse of A. We denote this inverse by A -1 . 


Theorem 6.3.1. A matrix A = ( ° n ° 12 ) has an inverse if and only if 

\a 2 i a 22 / 

its determinant, defined by det A = 011022 — 012021 , is not zero, in which case 


A ” 1 


1 

det A 


f 022 
\-«21 



determinant of a 
2x2 matrix 
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The reader may verify that, not only A _1 A = 1, but also AA _1 = 1. 

Equation (6.48) gives the components b\ and b 2 of a new vector obtained 
from an old vector with components x and y when the matrix A acts on the 
latter. We want to see what conditions A must satisfy for it to transform 
vectors in a basis into vectors of a new basis. Let B = {ai,a 2 } be the old 
basis. The components of ai in B are x = 1 and y = 0; so by (6.48), a), 
the vector obtained from ai by the action of A, has components bi = an and 
b 2 = a 2 i- The components of a 2 in B are x = 0 and y = 1; so a' 2 , the vector 
obtained from a 2 by the action of A, has components Ci = ai 2 and C 2 = 022 - 
The vectors ( 61 , 62 ) and (ci,c 2 ) form a basis if and only if they are linearly 
independent, i.e., 

( 61 , 6 2 ) = fc(ci, c 2 ) = (feci, kc 2 ) => 61 = fcci, 62 = kc 2 , 
does not hold for any constant k. This is equivalent to saying that 

— yf — or jj lC2 _ b 2 d y^ 0. 
ci c 2 

Expressing the 6 ’s and c’s in terms of ay’s, we recognize the last relation as 
a condition on the determinant of A. Using Theorem 6.3.1, we thus have 


Box 6.3.1. A transformation (or a matrix) transforms a basis into an¬ 
other basis if and only if it is invertible. 


Let us now consider three equations in three unknowns: 


a n x + a 12 y + a 13 z = 61, 
a 2 ix + a 22 y + a 23 z = b 2 , 
a 3 ix + a 32 y + a 33 z = 63, 


which can also be written in matrix form as 

( an ai2 ai3\ (%\ /6i\ 

021 022 «23 M = 62 => Ax = b. 

031 a 32 a 33 ) \z) \b 3 ) 


(6.53) 


(6.54) 


from three 
equations in three 
unknowns to two 
equations in two 
unknowns, and 
from the 
determinant of a 
2x2 matrix to 
that of a 3 x 3 
matrix 


We eliminate z from the set of equations by multiplying the first equation 
of (6.53) by a 2 3 and the second by ai 3 and subtracting. This will give one 
equation in x and y. Similarly, multiplying the first equation by 033 and the 
third by 013 and subtracting gives another equation in x and y. These two 
equations are 

(011023 - 021013 ) x + (a 12 a 2 3 - a 22 0 i 3 ) y = a 23 b\ - a 13 b 2 , 

S -v-' '-v-' V --' 

=an =ai 2 =bi 

(ana 3 3 - 031013 ) x + (a 12 a 33 - a 32 ai 3 ) y = a 33 bi - a 13 b 3 . (6.55) 

s. ✓ s. ✓ v ^ ^ 

V V ^ 

=SL21 =3-22 =b2 
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Thus, we have reduced the three equations in three unknowns to two equa¬ 
tions in two unknowns. We know how to find the solution for this set of 
equations. These solutions are given in Equations (6.49) and (6.51). In order 
for this equation to have a solution, the determinant of the coefficients must 
not vanish. Let us calculate this determinant: 


ana22 — ai2a2i — (aiia 23 — 0121013)(012O33 ~ 032013) 
— (012023 — 0220i3)(ana33 — 031013) 


— 011023012033 — 011023032013 — 021013012033 + 021013032013 


— 012023011033 + 012023031013 + 022013011033 — 022013031013 


= 013(011(022033 — 023032) — 012(021033 — 031023) 
+ 013(021032 — 022031)] 


= O13 


an det 


f 022 

\a 3 2 



012 det 


f a 2 i 

V «31 



+ 013 det 


f 021 

\031 



If the original set of equations is to have a solution, the expression in the 
square brackets must not vanish. We call this expression the determinant 
of the 3x3 matrix A. We can give a cookbook recipe for calculating the 
determinant; but first we need the following definition: 


Box 6.3.2. The cofactor of an element aij of a matrix A is defined 
as the product of (—1)* +J (i.e., +1 ifi + j is even and —1 ifi + j is odd) 
and the determinant of the smaller matrix (2x2, if A is a 3 x 3 matrix) 
obtained from A when its ith row and jth column are deleted. 


The following recipe applies to any (square) matrix, not just to 3 x 3 
matrices: 


Box 6.3.3. The determinant of A is obtained by multiplying each ele¬ 
ment of a row (or a column) by its cofactor and adding the products. 

If det A X 0, then Equation (6.49) gives 

a 22t>l — ai2b2 
X — -. 

ai 3 det A 

The numerator is 

a 22 bi — ai2b2 = (012033 — 032O13) (02361 — 01362) 

— (012 023 — 022013 X 03361 — 01363 ) 

= 013 ((022033 _ O 32 O 23 ) 61 + (032013 — 012033 ) 62 + (012 023 — 022013) 63 ] 

V ^ V V ^ V V ^ > 

= Cn =Cl2 =Cl3 

= Ol 3 (C'n 6 i + C 1262 + C 1363 ). 
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Therefore, 


Cn&l + Cl2^2 + C1363 


detA ' v ' ’ 

Similarly, using Equation ( 6 . 51 ), we find 

anb2 — a2ibi 
ai3 det A 

with 

anb2 — a2ibi = (011023 — 021013X03361 — 01363) 

— (011033 — 03 i 0 i 3 )( 023 &l — 01362) 

= O13 [(031023 — 021033) 61 + (011033 — 031013) 62 + (021013 — 011023) 63] 

>-—V-- S -V-' '- v -' 

= C *21 =C 22 =C 23 

= Oi 3 (C , 2 l 6 i + C 2262 + C 2363 ), 

so that 

C2161 + C2262 + C2363 

» = -dSA-■ (6 ' 57) 

With x and y thus determined, we can substitute them in any of the three 
original equations and find z. Let us use the first equation; then 

_ 61 — ana; — ai 2 y 
013 

C1161 + C12&2 + C1363 C21&1 + C22&2 + C2363 

6l ~ an detA ~~ ai2 detA 

013 

_ 61 (det A — 011C11 — 012(721) — &2(oiiCi2 + 012(722) — 63(011(713 + 012(723) 

ai3 det A 

The numerator N can be calculated: 

N = 61 [011(022033 — 023032) — 012(021033 — 031023) + 013(021032 — 022O31) 

— 011(022033 — 032023) — 012(031023 — 021033)] 

— 62 ( 011(032013 — 012033 ) + 012(011033 — 031013 )] 

— 63(011(012023 — 022013) + 012(021013 — 011023)] 

= «13 [(021032 — 022031) 61 + (012031 — 011032) 6 2 + (0n 022 — 012021) 63] 

'--- v --' -„— v--- v --— ' 

=C *31 =Cs2 =C33 

= 013(073161 + C3262 + (73363). 

It now follows that 

C3161 + C3262 + C3363 

2 =-1—7-. ( 6 . 58 ) 

det A v ; 

We can put Equations ( 6 . 56 ), ( 6 . 57 ), and ( 6 . 58 ) in matrix form: 


y = 


c 11 

C 12 

c 13 

C21 

C22 

C23 

C31 

C32 

C33 
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This is the inverse of Equation (6.54). The reader may verify that multiplying 
A on either side of C/ det A yields the identity matrix, so that C/ det A is indeed 
the inverse of A. The rule for calculating this inverse is as follows. Construct 
a matrix out of the cofactors and denote it by A: 


Mn 

A = I A-2i 

\^31 


A-12 

A22 

A 32 


-4l3\ 
A 23 I 
A33J 


and note that 


c U 

C 12 

Ci 3 \ 

C 21 

C 22 

C 23 = A 

C 31 

C 32 

c 33 


so, we obtain the important result 


A " 1 


det A 


1 

det A 



11 

A21 

A31 

12 

A22 

CO 

to 

13 

A23 

^33 


(6.60) 


inverse of a 3 x 3 
matrix 


(6.61) 


Equation (6.61), although derived for a 3 x 3 matrix, applies to all matrices, 
including a 2 x 2 one whose inverse was given in Theorem 6.3.1, as the reader 
is asked to verify. 

As in the case of 2 x 2 matrices, a transformation in space that takes a 
basis onto another basis is invertible. 


6.4 The Jacobian 

With the machinery of determinants at our disposal, we can formalize the 
geometric construction of area and volume elements in Chapter 2 to a pro¬ 
cedure which can be used for all coordinate transformations. We start with 
two dimensions and consider the coordinate transformation 

x = f(u,v), y = g(u, v). (6.62) 

Our goal is to write the element of area in the ( u , v ) coordinate system. This 
is the area formed by infinitesimal elements in the direction of u and v , i.e., 
elements in the direction of the primary curves of the ( u , v) coordinate system. 
For an arbitrary change du and dv in u and v, the Cartesian coordinates 
change as follows: 


, df df 

dx = —— du + —— dv, 
ou ov 

dg dg 

dy = du + dv. 
ou ov 
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The element in the direction of the first primary curve is obtained by holding 
v constant and letting u vary. This corresponds to setting dv = 0 in the above 
equations. It follows that the first primary (vector) length element is 


dl\ = &x dx± + e y dy\ 


*df ~dg 

e x-z~ du + e „— du. 
ou ou 


(6.63) 


Similarly, the second primary (vector) length element, obtained by fixing u 
and letting v vary, is 


dl 2 


, „ , „ < 9 / *.dg 

: dx 2 + e„ dy 2 = e x —- dv + e„ — dv. 

ov ov 


(6.64) 


When we derived the elements of area and volume in the three coordinate 
systems in Chapter 2, we used the fact that the set of unit vectors in each 
system were mutually perpendicular. Therefore, the area and volume elements 
were obtained by mere multiplication of length elements. We are not assuming 
that e u and e v are perpendicular. Thus, we cannot simply multiply the lengths 
to get the area. However, we can use the result of Example 1.1.2 which gives 
the area of a parallelogram formed by two non-collinear vectors. Writing the 
cross product in terms of the determinant, we have 



/ e^. 

e y 

eA 

dli x dll = det 

%tdu 

OU 

jf- du 
ou 

0 



Qy- dv 
ov 

0 / 


= e z det 

(Ul Jhl\ 

df da ) 




du dv 


Jacobian matrix 
and Jacobian 


and the area is simply the absolute value of this cross product: 



(df dg\ 


dx 

dy 

da = 

detff f) 

du dv = 

~5u 

~du 


a?/ 


dx 

~cFv 

OV 


(6.65) 


where we substituted x and y for / and g and introduced a new notation 
for the (absolute value of the) determinant. The matrix whose determinant 
multiplies dudv is called the Jacobian matrix, and the absolute value of its 
determinant, the Jacobian. 


Example 6.4.1. Let us apply Equation (6.65) to polar coordinates. The trans¬ 
formation is 

x = f(r, 9) = r cos 9, y = g(r, 9) = r sin 9. 


dx 

dr 

dy 

dr 


d£ 

dr 

dg_ 

dr 


= cos 6, 
= sin 6, 


dx 

~de 

dy 

09 


df 

09 

9g_ 

09 


—r sin 9 , 
r cos#, 


This gives 
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and 


dx dy 
TFr dr 

dr d9 = 

cos 6 

sin 9 

dx dy_ 

—r sin 9 

r cos 9 

W oS 





da = 

I dx 

I TJB 

= (r cos 2 6 + r sin 2 9) dr d9 = r dr d9, 
which is the familiar element of area in polar coordinates. 


dr d6 


The procedure discussed above for two dimensions can be generalized to 
three dimensions using the result of Example 1.1.3 which gives the volume of a 
parallelepiped formed by three non-coplanar vectors. Suppose the coordinate 
transformations are of the form 


x = f(u,v,w), y = g(u,v,w), z = h(u,v,w). 


Then 


, df df df 

dx = —du + — dv + —— dw , 
oil dv dw 

dy= ^L du+ ^L dv+ ^. dw , 

Ou ov Ow 


, dh , dh , dh , 
dz = — du + — dv + —— dw. 
Ou ov Ow 


The first primary element of length is obtained by fixing v and w and 
allowing u to vary; similarly for the second and third primary elements of 
length. We therefore have 


dh 

dh 

dh 


df 


df 


e x dx i + e y dyi + e z dzi = e x du + e v du + e z du, 

ou Ou Ou 

, - , - , ~df ~dg „ dh 

e x dx 2 + e y dy 2 + e, dz 2 = e x ^~ dv + e v -^~ dv + e z —dv, 

ov Ov Ov 

dg 


dh 


dh 


e x dx 3 + e y dy 3 + e z dz 3 = e x — dw + e„— dw + e z ——dw. 

dw dw dw 


Example 1.1.3 now yields 






(%Ldu 

OU 

jM- du 
ou 

du\ 

OU 

dV = 

dl\ * (c ^2 x d/ 3 ) 

= 

det 

21 dv 
ov 

dv 

ov 

§hdv 

ov 





1 jJ- dw 
\ow 

jA- dw 
ow 

MdWj 
ow / 


We summarize the foregoing argument in 


Theorem 6.4.2. For the coordinates u, v, and w, related to the Cartesian 
coordinates by x = f(u,v,w), y = g(u,v,w), and z = h(u,v,w), the volume 
element is given by 
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Jacobian defined 



James Joseph 
Sylvester 
1814-1897 


dx 

~5u 



dx 

Thv 


dy dz 

Tm chi 

dy dz 
dv 7h 

dy dz 
dw dw 


du dv dw. 


( 6 . 66 ) 


The (absolute value of the) determinant multiplying du dv dw is called the 
Jacobian of the coordinate transformation. 


Determinants were mathematical objects created in the process of solving a system 
of linear equations. As early as 1693 Leibniz used a systematic set of indices for the 
coefficients of a system of three equations in two unknowns. By eliminating the two 
unknowns from the set of three equations, he obtained an expression involving the 
coefficients that “determined” whether a solution existed for the set of equations. 

The solution of simultaneous linear equations in two, three, and four unknowns 
by the method of determinants was created by Maclaurin around 1729. Though 
not as good in notation, his rule is the one we use today and which Cramer used 
in connection with his study of the conic sections. In 1764, Bezout systematized 
the process of determining the signs of the terms of a determinant for n equations 
in n unknowns and showed that the vanishing of the determinant is a necessary 
condition for nonzero solutions to exist. 

Vandermonde was the first to give a connected and logical exposition of the 
theory of determinants detached from any system of linear equations, although he 
used his theory mostly as applied to such systems. He also gave a rule for expanding 
a determinant by using second-order minors and their complementary minors. In 
the sense that he concentrated on determinants, he is aptly considered the founder 
of the theory. 

One of the consistent workers in determinant theory over a period of over fifty 
years was James Joseph Sylvester. 

In 1833 he became a student at St. John’s College, Cambridge, and took the 
difficult tripos examination in the same year along with two other famous math¬ 
ematicians, Gregory and Green (the creator of the important Green’s functions). 
Sylvester came second, Green who was 20 years older than the other two came fourth 
with Duncan Gregory fifth. (The first-place winner did little work of importance 
after graduating.) 

At this time it was necessary for a student to sign a religious oath to the Church 
of England before graduating and Sylvester, being Jewish, refused to take the oath, 
so could not graduate. For the same reason he was not eligible for a Smith’s prize 
nor for a Fellowship. 

From 1838 Sylvester started to teach physics at the University of London, one 
of the few places which did not bar him because of his religion. Three years later 
he was appointed to a chair in the University of Virginia but he resigned after a few 
months. A student who had been reading a newspaper in one of Sylvester’s lectures 
insulted him and Sylvester struck him with a sword stick. The student collapsed in 
shock and Sylvester believed (wrongly) that he had killed him. He fled to New York 
boarding the first available ship back to England. 

On his return, Sylvester worked as an actuary and lawyer but gave private 
mathematics lessons. His pupils included Florence Nightingale. By good fortune 
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Cayley was also a lawyer, and both worked at the courts of Lincoln’s Inn in London. 
Cayley and Sylvester discussed mathematics as they walked around the courts and, 
although very different in temperament, they became life-long friends. 

Sylvester tried hard to return to mathematics as a profession, and he applied 
unsuccessfully for a lectureship in geometry at Gresham College, London, in 1854. 
Another failed application was for the chair in mathematics at the Royal Military 
Academy at Woolwich, but, after the successful applicant died within a few months 
of being appointed, Sylvester became professor of mathematics at Woolwich. Being 
at a military academy, Sylvester had to retire at age 55. At first it looked as though 
he might give up mathematics since he had published his only book at this time, 
and it was on poetry. Apparently Sylvester was proud of this work, entitled The 
Laws of Verse, since after this he sometimes signed himself “J. J. Sylvester, author 
of The Laws of Verse.” 

In 1877 Sylvester accepted a chair at the Johns Hopkins University and founded 
in 1878 the American Journal of Mathematics, the first mathematical journal in the 
USA. 

In 1883 Sylvester, although 68 years old at this time, was appointed to the 
Savilian chair of geometry at Oxford. However he only liked to lecture on his own 
research and this was not well liked at Oxford where students wanted only to do well 
in examinations. In 1892, at the age of 78, Oxford appointed a deputy professor 
in his place and Sylvester, by this time partially blind and suffering from loss of 
memory, returned to London where he spent his last years at the Athenaeum Club. 

Sylvester did important work on matrix and determinant theory, a topic in which 
he became interested during the walks with Cayley while they were at the courts 
of Lincoln’s Inn. In particular he used matrix theory to study higher-dimensional 
geometry. He also devised an improved method of determining conditions under 
which a system of polynomial equations has a solution. 

The formula for the derivative of a determinant when the elements are functions 
of a variable was first given in 1841 by Jacobi who had earlier used them in the 
change of variables in a multiple integral. In this context the determinant is called 
the Jacobian of the transformation (as discussed in the current section of this book). 


6.5 Problems 

6.1. What vector is obtained when the vector a 2 of a basis {ai, a 2 } is actively 
transformed with the matrix (g J) ■ 

6.2. Show that the nonzero matrix A = (g g) cannot have an inverse. Hint: 
Suppose that B = (“ ^) is the inverse of A. Calculate AB and BA, set them 
equal to the unit matrix and show that no solution exists for a , h, c, and d. 

6.3. Let A = (^ ) and B = (^) be arbitrary matrices. Find AB, A*, 
and B 4 and show that (AB) 4 = B 4 A 4 . 

6.4. Find the angle between 1 + t and 1 — t when the inner product is inte¬ 
gration over the interval (0,1). 

6.5. Instead of (0,1), choose (—1,1) as the interval of integration for CPi [t]. 
From the basis {1, t}, construct an orthonormal basis using the Gram-Schmidt 
process. 
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6.6. Take the interval of the integration to be (—1,+1), and find the inner 
product matrix for the basis {l,f} of CPi [t]. 

6.7. Find the angle between two vectors a and b, whose components in an 
orthonormal basis are, respectively, (1, 2 ) and ( 2 , —3). Use the Gram-Schmidt 
process to find the orthonormal vectors obtained from a and b. 

6.8. Use the Gram-Schmidt process to find an orthonormal basis in three 
dimensions from each of the following: 

(a) (- 1 , 1 , 1 ), ( 1 , 1 , 1 ), ( 1 , 1 , 1 ) (b) ( 1 , 2 , 2 ), ( 0 , 0 , 1 ), ( 0 , 1 , 0 ) 

6.9. (a) Find the inner product matrix associated with the basis vectors 
a i = &x + by? a 2 = bj, + e z , and a 3 = e y + e,. 

(b) Calculate the inner product of two vectors a and b, whose components in 
the basis above are, respectively, (1,—1,2) and (0,2,3). 

(c) Use the Gram-Schmidt process to find three orthonormal vectors out of 
the basis of (a). 

6.10. Use Gram-Schmidt process to find orthonormal vectors out of the three 
vectors (2,—1,3), (—1,1,—2), and (3,1,2). What do you get as the last 
vector? What can you say about the linear independence of the original 
vectors? 

6.11. What is the angle between the second and fourth vectors in the standard 
basis of fP 3 [t] when the interval of integration of the inner product is ( 0 , 1 )? 
Between the first and fourth vectors? 

6.12. Calculate the inner product matrix for the standard basis of CP 3 [f ] when 
the interval of integration of the inner product is (—1, +1). Now find the angle 
between all vectors in that basis. 

6.13. The inner product matrix in a basis {ai,a 2 } is given by 



(a) Calculate the cosine of the angle between ai and a 2 . 

(b) Suppose that a = —ai + a 2 and b = 2a 1 — a 2 . Calculate |a|, |b|, a • b, 
and the cosine of the angle between a and b. 

6.14. Let ai = 1 + t and a 2 = 1 — t be a basis of CPi [t] . Define the inner 
product as the integral of products of polynomials over the interval (0, a) with 
a > 0. 

(a) Determine a such that ai and a 2 are orthogonal. 

(b) Given this value of a, calculate |ai| and |a 2 |. 

(c) Find two orthogonal polynomials {ei, £ 2 } of unit length that form a basis 
for CPi [£]. 

(d) Write the polynomial b = 3 — 2t as a linear combination of ei and t? 2 . 

(e) Calculate b • b using the definition of the inner product. 

(f) Calculate b • b by squaring (and then adding) the components in {ei, e 2 }. 
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6.15. Show that the matrix C defined in Equations (6.56)-(6.59) is indeed 
the transpose of the matrix A of cofactors of A. 

6.16. Show directly that the matrix given in Equation (6.61) is indeed the 
inverse of the matrix A. 

6.17. From the transformation rules (1.8) and (1.9) giving the Cartesian 
coordinates as functions of cylindrical and spherical coordinates, and using 
the Jacobian (6.66), find the volume elements in cylindrical and spherical 
coordinates 

6.18. The elliptic coordinates are given by 

x = a cosh u cos 9 
y = a sinh u sin 6. 

Using the Jacobian for two variables (6.65), find the element of area for the 
elliptic coordinate system. 

6.19. The elliptic cylindrical coordinates are given by 

x = a cosh u cos 9 
y = a sinh u sin 9 

z = z 

Using the Jacobian for three variables (6.66), find the element of volume for 
the elliptic cylindrical coordinate system. 

6.20. The prolate spheroidal coordinates are given by 

x = a sinh u sin 9 cos p 
y — a sinh u sin 9 sin p 
z = a cosh u cos 9 

Using the Jacobian for three variables (6.66), find the element of volume for 
the prolate spheroidal coordinate system. 

6.21. The toroidal coordinates are given by 

a sinh 9 cos p 
cosh 9 — cos u 
a sinh 9 sin p 
cosh 9 — cos u 
a sin u 

cosh 9 — cos u 

Using the Jacobian for three variables (6.66), find the element of volume for 
the toroidal coordinate system. 
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6.22. A coordinate system ( R , 0, (j>) in space is defined by 

x = R cos 0 cos 4> + b cos (j> 
y = R cos 0 sin <j> + b sin (f> 
z = R sin 0 

where b is a constant, and 0 < R < b. Using the Jacobian for three variables 
(6.66), find the element of volume for this coordinate system. 




Chapter 7 

Finite-Dimensional Vector 
Spaces 


Human visual perception of dimension is limited to two and three, the plane 
and space. However, his mental perception, and his ability to abstract, rec¬ 
ognizes no bounds. If this abstraction were a mere useless mental exercise, 
we would not bother to add this chapter to the book. It is an intriguing 
coincidence that Nature plays along with the tune of human mental abstrac¬ 
tion in the most harmonious way. This harmony was revealed to Hermann 
Minkowski in 1908 when he convinced physicists and mathematicians alike, 
that the most natural setting for the newly discovered special theory of rel¬ 
ativity was a four-dimensional space. Eight years later, Einstein used this 
concept to formulate his general theory of relativity which is the only viable 
theory of gravity for the large-scale structure of space and time. In 1921, 
Kaluza, in a most beautiful idea, unified the electromagnetic interaction with 
gravity using a five-dimensional spacetime. Today string theory, one of the 
most promising candidates for the unification of all forces of nature, uses 
11-dimensional spacetime; and the language of quantum mechanics—a the¬ 
ory that describes atomic, molecular, and solid-state physics, as well as all 
of chemistry—is best spoken in an infinite-dimensional space, called Hilbert 
space. 

The key to this multidimensional abstraction is Descartes’ ingenious idea 
of translating Euclid’s geometry into the language of coordinates whereby the 
abstract Euclidean point in a plane is given the two coordinates (x,y), and 
that in space, the three coordinates ( x,y,z ), where x, y, and 2 are real num¬ 
bers. Once this crucial step is taken, the generalization to multidimensional 
spaces becomes a matter of adding more and more coordinates to the list: 
(a :,y,z,w) is a point in a four-dimensional space, and (x,y, z,w,u) describes 
a point in a five-dimensional space. In the spirit of this chapter, we want to 
identify points with vectors as in the plane and space, in which we drew a 
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formal definition 
of a linear 
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a linear operator 


directed line segment from the origin to the point in question. In general, an 
n-dimensional Cartesian vector x is 

x = (xi,x 2 , ■ ■ ■ ,x n ) (7.1) 

in which Xj is called the jth component of the vector. These have all the 
properties expected of vectors: You can add them 

x + y = (xi,X 2 , ...,x n ) + (yi,y 2 , ...,y n ) = (%i +Vi,X 2 + 2/2, • • •, x n + y n ), 

you can multiply a vector by a number 

ox = a(xi,x 2 ,..., x n ) = (axi,ax 2 , ■ ■ ■, ax„), 

and the zero vector is 0 = (0,0,..., 0). Two vectors are equal if and only if 
their corresponding components are equal. Sometimes, it will be convenient 
to denote these vectors as columns rather than rows. 

The set of real numbers, or the set of points on a line, is denoted by R. 
It is common to denote the set of points in a plane—or, in the language of 
Cartesian coordinates, the set of pairs of real numbers ( x,y )—by R 2 , and 
the set of points in space by R 3 . Generalizing this notation, we denote the 
set of points in the n-dimensional Cartesian space by R". We now have an 
infinite collection of “spaces” of various dimensions, starting with the one¬ 
dimensional real line R 1 = R, moving on to the two-dimensional plane R 2 , 
and the three-dimensional space R 3 , and continuing to all the abstract spaces 
R" with n > 4. The concepts of linear combination, linear independence,and 
basis are exactly the same as before. The vectors 

e 1 = (l,0,...,0), e 2 = (0,l,...,0), ... e„ = (0,0,..., 1) (7.2) 

form a basis for R", called the standard basis. 


7.1 Linear Transformations 

A linear transformation or a linear operator is a correspondence that 
takes a vector in one space and produces a vector in another space in such a 
way that the operation of summation of vectors and multiplication of vectors 
by numbers is preserved. If we denote the linear transformation by T, then 
in mathematical symbolism, the above statement becomes 

T(ax + fiy) = aT(x) + /3T(y). (7.3) 

Matrices are prototypes of linear transformations. In fact, we saw earlier 
that it was possible to transform vectors in the plane to vectors in space 
and vice versa via 3 x 2 or 2 x 3 matrices. We did not attempt to verify 
Equation (7.3) for those transformations, but the reader can easily do so. 
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In fact, denoting vectors of R" and R m by column vectors, we can immediately 
generalize Equations (6.36) and (6.37) to 



(*'A 


( an 

ai2 

ain\ 


/aA 


a' 2 

= 

«21 

022 

02 n 


OL2 


Wm) 


\®ml 

a m 2 

C^rrm ) 


\^n / 

where 

A is 

an 

m x n 

matrix— 

-i.e., it 

has to. 


or 


a' = Aa, 


(7.4) 


elements are real numbers. The reader may verify that Equation (7.4) is 
a linear transformation that maps vectors of R" to those of R m . 

Other linear operators of importance are various differential operators, 
i.e., derivatives of various order. For example, it is easily verified that d/dx 
is a linear operator acting on the space of differentiable functions. 1 This is 
because 

d ( t l a \ d f < a d 9 
— (a/ + (3g) = a— + (3— 

dx ax ax 


for a and j3 real constants. Similarly d 2 /dx 2 and derivative of higher orders, as 
well as partial derivatives of various kinds and orders, are all linear operators. 
In fact, even when these derivatives are multiplied by functions (on the left), 
they are still linear. In particular, the second-order linear differential operator 


L = P2 (x) y-j 

ax 


+ Pl(x)-T- +p 0 {x) 
dx 


is indeed a linear operator. 

If a linear transformation T maps vectors of R n to vectors of R m , and S 
maps vectors of R m to vectors of R fc , then we can “compose” or “multiply” 
the two transformations to obtain a linear transformation ST which maps 
vectors of R" to vectors of R fc . In terms of matrices, T is represented by an 
m x n matrix T, S is represented by a fc x to matrix S, and ST is represented 
by an k x n matrix which is the product of S and T with S to the left of T. 
The product of matrices is as outlined in Box 6.1.3. 


Box 7.1.1. If A is a k x to matrix, and B is an m x n matrix, then AB 
is a k x n matrix whose entries are given by Box 6.1.3. 


The product BA is not defined unless k = n, in which case BA will be an 
to. x m matrix. 

Using polynomials, we can generate multidimensional vector spaces by 
adding increasing powers of t. Then, the collection IP n [t\ of polynomials of 
degree n and less becomes an (n + l)-dimensional vector space. A convenient 
basis for this vector space is {1, t, t 2 ,..., t n } which we call the standard 

1 The reader may want to check that the collection of differentiable functions is indeed a 
vector space with the “zero function” being the zero vector. 
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basis of 7 n [t]. The reader may verify that the operation of differentiation (of 
any order) is a linear transformation on tP n [t] which can be represented by 
matrices as done in Example 6.2.2. 

Example 7.1.1. Let us find the matrix that represents the operation of second 
differentiation on CP 3 [t] using the standard basis of CP 3 [£]. Recall that we only need to 
apply the second derivative to the basis vectors fi = 1 , f 2 = t, f 3 = t 2 , and fit = t 3 . 
We use a prime to denote the transformed vector: 

f l' = ^( 1 )=° = °- fl +°-f2, 
f' = ^(t)=0 = 0.fi + 0.f 2 , 

j2 

f! =-^(t 3 ) = 6t = ° • fi + 6 • f 2 , 

where we have anticipated the fact that double differentiation of CP 3 [£] results in 
CP 1 [t]. Following the rule of Box 6.2.2, we can write the transformation matrix as 

fO 0 2 0 \ 

^0 0 0 6 )' 


We may verify that the coefficients in CPi[t] of the second derivative of an arbi¬ 
trary polynomial f(t) = ao + ait + a 2 t 2 + ait 3 can be obtained by the product of 
the matrix of second derivative and the 4x1 column vector representing f(t). In 
fact, 



/ a 0 \ 

(0 0 2 0\ I 

on 

"o' 

O 

O 

0.2 

\«3 / 



These are the two coefficients of the resulting polynomial in CPi [£]. The polynomial 
itself is 2a 2 + 603 1 which is indeed the derivative of the third degree polynomial 

fit)- ■ 


7.2 Inner Product 

Since the concepts of length and angle are not familiar for R n , we need to 
define the inner product first and then deduce those concepts. We can gener¬ 
alize the usual inner product of R 2 and M 3 in terms of components of vectors. 
Let 

a = (ai, a 2 , •. •, On) and b = (bi,b 2 ,... ,b n ). 

inner product in 
R n defined in 
terms of 

components in the 
standard basis 


Then 

a • b = a\bi + CI 2&2 + • • • + o, n b n 
is the immediate generalization of the dot product to 


(7.5) 
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This, of course, is not the most general inner product. For that, we need 
an inner product matrix G. As in the case of the plane and space, this is 
simply a symmetric n x n matrix whose elements determine the dot products 
of the vectors of the basis in which we are working. 



( 9 u 

912 

■ • ■ gin\ 

G = 

921 

922 

■■■ g 2 n 


\ 9 n 1 

9 n 2 

• • • 9 nn) 

Example 7.2.1. 

Let 

us find the 


9 ij = 9 ji, i,j = l, 2 ,...,n. ( 7 . 6 ) 


r product matrix for the basis {1, t, t 2 , t 3 } of 
3*3 [t]. As usual, we assume that the interval of integration for the inner product is 
( 0 , 1 ). Because of the symmetry of the matrix and the fact that we have already 
calculated the 3 x 3 submatrix of G, we need to find g 14, 324, 334, and <744. Once 
again, let fi = fi{t) = 1, f 2 = /2(f) = t, t 3 = f 3 {t) = t 2 , and f 4 = / 4 (t) = t 3 ; 
then 


Similarly, £34 


r 1 

r 1 0 

314 = fi • f 4 = / dt = 

/ t dt 

Jo 

Jo 

f 1 

r 1 „ 

324 = f 2 • f 4 = / dt = 

/ t 4 dt 

Jo 

Jo 

and #44 = It follows that 




This matrix can be used to find the dot product of any two vectors in terms of their 
components in the basis {l,t,t 2 ,t 3 } of CP3[t]. g 


If a and b have components (ai, <22,..., a n ) and (61, b 2 , • ■ •, b n ), then their 
inner product is given by 



( ffn 

312 ■ 

■ gin\ 


( b A 

\ 

321 

922 ■ 

■ 92n 


^2 

a 1 a 2 ... a n ) 

\9n 1 

9n2 • 

• 9nn) 


{bn) 


( 7 . 7 ) 


As usual, if this expression is zero, we say that a and b are G-orthogonal. For 
an orthonormal basis, the inner product matrix G becomes the unit matrix 2 
and we recover the usual inner product of vectors in terms of components. 

With a positive definite inner product at hand, we can define the length of 
a vector as the (positive) square root of the inner product of the vector with 
itself. Can we define the angle as well? We can always define 

2 Only if the inner product is positive definite. 


inner product in 
R” defined in 
terms of the 
metric matrix and 
components in a 
general basis 


length of a vector 
defined in terms of 
inner product 




220 


Finite-Dimensional Vector Spaces 


angle defined in 
terms of inner 
product 


a • b a • b 

cos 0 = , , „ , = . 

I a l |b| \/a • a \/b • b 

But how do we know that the ratio on the RHS is less than one? After all, 
a true cosine must have this property! It is an amazing fact of nature that 
any positive definite inner product has precisely this property. To show this, 
let a and b be two vectors in any vector space on which an inner product is 
defined. Denote the unit vector in the a direction by e a , and construct the 
vector 

b' = b - (b • e a ) e a (7.8) 

a number 


derivation of the 
Schwarz inequality 


which is easily seen to be perpendicular to e a (and therefore to a). If the 
inner product is positive definite, then 

b' • b' > 0 => [b - (b e a )e a ] • [b (b • e a )e a ] > 0 


or 


It follows that 


bAp-2 b- [(b • e g )e a | + (b • e a ) 2 e^> 0. 

= |b| 2 =(b-e a ) 2 =1 


|b| 2 -(b-e a ) 2 >0 =► |b| 2 > 


b- [ A 


and 

|b| 2 > (^p) 2 =* Ib| 2 |a| 2 > (b • a) 2 . 
This is the desired inequality. 


Box 7.2.1. ( Schwarz Inequality ). If a. and, b are two nonzero vectors 
of a vector space for which a positive definite inner product is defined, 
then 

|a| |b| > |a • b|. 

The equality holds only if b is a multiple of a. 


Schwarz inequality 
holds in all inner 
product spaces 
regardless of their 
dimensionality. 


The last statement follows from the fact that h' • b ; = 0 only if b ; = 0 when 
the inner product is positive definite [see Equation (7.8)]. 

The Schwarz inequality holds not only for finite-dimensional vector spaces 
such as R n or fP„[f], but also for infinite-dimensional vector spaces. It is 
one of the most important inequalities in mathematical physics. One of its 
consequences is that we can actually define the angle between two nonzero 
vectors in R" or !P„[t] (or any other vector space, finite or infinite, for which 
a positive definite inner product exists). 





7.2 Inner Product 


221 


Example 7 . 2 . 2 . What is the angle between the third and fourth vectors in the 
standard basis of CP 3 [t] when the interval of integration of the inner product is ( 0 , 1 )? 
All the inner products are calculated in Example 7.2.1. Therefore, 

0 fs • *4 <734 1/6 ^35 

Vf 3 • f3 \/U ■ f| V^33y^44 y/1/5 y/Tpf 6 

or 9 = 9.594°. ■ 

As in the case of the plane and space, it is convenient to construct or¬ 
thonormal basis vectors in R n . This is done by the Gram-Schmidt process 
which can easily be generalized. Suppose B = {ai,a 2 ,... ,a„} is a basis for 
M". Again, to avoid complications, we assume that the inner product is Eu¬ 
clidean so that the inner product of every nonzero vector with itself is positive. 
We know how to construct three orthonormal vectors out of {ai,a2,a3}, we 
did that in our discussion of the space vectors. Call these new orthonormal 
vectors {ei,e 2 ,e 3 }. Now construct the vector a' 4 . 

a 4 = a 4 — (a4 • ei)ei — (a4 • e 2 )e 2 — (a4 • e 3 )e 3 

which is obtained from a 4 by taking away its projections along e 4 , e 2 , and e 3 . 
Now note that 


e 4 • a 4 = e 4 • a 4 — (a 4 • e 4 ) ei • e 4 — (a 4 • e 2 ) e 2 • ei —(a 4 • e 3 ) e 3 • e 4 = 0. 


=o 


=o 


Similarly, e 2 • a 4 = 0 and e 3 • a 4 = 0; i.e., a 4 is orthogonal to e 4 , e 2 , and e 3 . 
This suggests defining £4 as 

t _ a 4 

l a 41 \/ a 4 ‘ a 4 

This process can continue until we come up with n orthonormal vectors. This 
will happen only if the n vectors with which we started are linearly indepen¬ 
dent. 


Box 7.2.2. If {a 4 , a 2 ,...,a„} are linearly independent vectors of R", 
then we can construct a set of n orthonormal vectors out of them by the 
Gram-Schmidt process. 


An orthonormal basis will be denoted by {e 4 , e 2 ,..., e„}, where, as usual, 
the symbol e stands for unit vectors. We can abbreviate the orthonormal 
property of these vectors by writing 


— 


if i = j, 
if i ^ j. 


Gram-Schmidt 

process 
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Kronecker delta 
and its use in 
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vectors 


G-orthogonal 

matrix 


There is a symbol that shortens the above statement even further. It is called 
the Kronecker delta and denoted by Sij. It is defined by 


S 


ij 


1 if i = j, 
0 if i ^ j- 


(7.9) 


Therefore, the orthonormality condition can be expressed as 


e* • ej = . (7-10) 

We shall see many examples of the use of the Kronecker delta in the sequel. 

Transformations that leave the inner products unchanged can be obtained 
in exactly the same way as for the plane and the space. For A to preserve the 
inner product, we need to have 

AGA = G, (7.11) 

i.e., it has to be G-orthogonal. If G is the identity matrix, then A can be 
thought of as an n-dimensional rigid rotation and is simply called orthogonal ; 
it satisfies 

AA = 1 (7.12) 

or 


( a\\ 

®12 

• ai n \ 


( a\\ 

021 

■ a„i\ 


(i 

0 . 

. 0\ 

021 

«22 

■ a 2n 

' 


Ol2 

022 

(In 2 

— 

0 

1 . 

. 0 

\a n l 

O n 2 

• 

• ®nn ) 


\^ln 

02 n 

• ®nn ) 


1° 

0 . 

■ V 


It should be clear from this that the columns of the matrix A, considered as 
vectors, have unit length and are orthogonal to other columns in the usual 
Euclidean inner product. 


7.3 The Determinant 

The determinant of an nxn matrix is obtained in terms of cofactors in exactly 
the same way as in the case of 3 x 3 matrices. The cofactors are themselves 
determinants of (n — 1) x (n — 1) matrices which can be expanded in terms 
of cofactors of their elements which are determinants of (n — 2) x (n — 2) 
matrices, etc. Continuing this process, we finally end up with determinants 
of 2 x 2 matrices. The determinant is also related to the inverse of a matrix 
[see Equations (6.60) and (6.61)]: 

Theorem 7.3.1. The matrix A has an inverse if and only if detA yf 0 in 
which case 




(A n 

A21 

A n A 

1 = 1 A = 

1 

Ai2 

A22 

A n 2 

det A 

det A 

\Ai„ 

A271 

• • Ann ) 


where Ay is the cofactor of aij as defined in Box 6.3.2. 


(7.13) 
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Calculation of the determinant becomes extremely cumbersome when the 
dimension of the matrix increases beyond 4 or 5. However, there are certain 
properties of the determinant which may sometimes facilitate its calculation. 
The determinant has the following properties: 

1. To obtain the determinant of an n x n matrix, multiply each element of 
one row (or one column) by its cofactor and then add the results. 

2. The determinant of the unit matrix is 1. 

3. The determinant of a matrix is equal to the determinant of its transpose: 
clet A = det A*. 

4. If two rows (or two columns) of a matrix are proportional (in particular, 
equal), the determinant of the matrix is zero. 

5. If a row or column—treated as a vector in R"—of a matrix is multiplied 
by a constant, the determinant of the matrix will be multiplied by the 
same constant. 

6. If two rows (or two columns) of a matrix are interchanged, the determi¬ 
nant changes sign. 

7. The determinant will not change if we add to one row (or one column) 
a multiple of another row (or another column). The addition of rows or 
columns and their multiplication by numbers are to be understood as 
operations in R". 

An important relation, which we state without proof, 3 is 

det(AB) = det Adet B. (7-14) 

This, in combination with det 1 = 1 and AA _1 = 1, gives 

det(AA _1 ) = det 1 => detAdet(A _1 ) = 1 det(A -1 ) = (7.15) 

det A 

In words, the determinant of the inverse of a matrix is the inverse of its 
determinant. 

Recall that an orthogonal matrix A satisfies A A* = 1. The third property 
of the determinant given above and (7.14) can be used to obtain 

det(AA*) = det 1 (detA) 2 = l detA=±l. (7-16) 


So 


Box 7.3.1. The determinant of an orthogonal matrix is either +1 or — 1. 


3 See Hassani, S. Mathematical Physics: A Modem Introduction to Its Foundations , 
Springer-Verlag, 1999, Chapters 3 and 25. 
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eigenvalue 
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the determinant of 
T — Al be zero. 


7.4 Eigenvectors and Eigenvalues 

One of the most important applications of the determinant is in finding cer¬ 
tain vectors that are not affected by transformations. As an example, consider 
rotation which is a linear transformation of space onto itself (or a transfor¬ 
mation from M 3 to R 3 ). A general rotation in space is very complicated (see 
Example 6.2.5 and the discussion immediately preceding it), but if we can 
find an axis which is unaffected by the operation, then the process becomes a 
simple rotation about this axis. 

When we say that a vector is unaffected, we mean that its direction (and 
not necessarily its magnitude) is unchanged. We use n x n matrices to repre¬ 
sent transformations of R n . If x is a (column) vector in R” whose direction is 
not affected by the transformation T, then we can write 

T x = Ax or (T — Al)x = 0, (7-17) 

where A is a real number and we introduced the unit matrix to give meaning to 
the subtraction of A from T. In Equation (7.17), x is called the eigenvector 
and A the eigenvalue of the linear transformation. Since the zero vector triv¬ 
ially satisfies (7.17), we demand that eigenvectors always be nonzero. Equation 
(7.17) itself is called an eigenvalue equation; its solution involves calculat¬ 
ing both the eigenvalues and the eigenvectors. It is clear from (7.17) that a 
multiple of an eigenvector is also an eigenvector (see Problem 7.6). There¬ 
fore, an eigenvalue equation (7.17) has no unique solution. By convention, we 
normalize eigenvectors so that their length is unity. 

To find the solution to (7.17), we note that the matrix (T — Al) must have 
no inverse, because if it did, then we could multiply both sides of the equation 
by (T — Al) -1 and obtain 

(T — A1) _1 (T — Al) x = (T — A1) _1 0 => x = 0 

S. v ✓ V, v ✓ 

=1 =0 

which is not an acceptable solution. So, we must demand that the matrix 
(T — Al) have no inverse. This will happen only if the determinant of this 
matrix vanishes. So, the problem is reduced to finding those A’s which make 
the determinant of the matrix vanish. In other words, the eigenvalues are the 
solutions of the equation 

det(T — Al) = 0. (7.18) 

Once the eigenvalues are determined, we substitute them one by one in the 
matrix equation (7.17) and find the corresponding eigenvectors by solving the 
resulting n linear equations in n unknowns. The best way to explain this is 
through an example. 

Example 7.4.1. Let T be a linear transformation of space (or R 3 ) represented by 
the matrix 



10 0 
0 1 2 
0 2 1 
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The eigenvalue equation is 


(T — Al)x = 0 or 


t\ 0 0\ /I 0 0\ 

0 1 2 —A O 1 0 

\0 2 1 / \0 0 1 / 



This can also be written as 


/I — A 0 0 \ /xi\ /0\ 

0 1 — A 2 *2 = 0 

V 0 2 1 — Ay \x 3 ) \0/ 



(7.19) 


whose nontrivial solution is obtained by setting the determinant of the matrix equal 
to zero: 

/1 — A 0 0 \ 

det 0 1 - A 2 =0 

V 0 2 1 — Ay 

or 

(l-A)det^” A = (1 — A) [(1 — A ) 2 — 4] = 0. 

This equation has the solutions 

1 — A = 0 or (1- A ) 2 =4 => 1- A = ±2. 


It follows that there are three eigenvalues: Ai = 1, A 2 = — 1, and A 3 = 3. We now 
find the eigenvectors corresponding to each eigenvalue. 

Substituting Ai = 1 for A in Equation (7.19) yields 


(0 0 0\ /*A 

0 0 2 I I *2 ] 

\0 2 0 / \x 3 J 




It follows that X 2 = 0 = x 3 . Therefore, the first eigenvector is 


ai = 




with xi an arbitrary real number. This arbitrariness comes from the fact that a 
multiple of an eigenvector is also an eigenvector. We choose x\ = 1 to normalize the 
eigenvector to unit length. Denoting this eigenvector by ei, we have 



To find the second eigenvector, we substitute A 2 = —1 for A in Equation (7.19). 
This gives 


/ 2 0 0 \ 
0 2 2 
\0 2 2 ) 




or 


< 2*i > 

2*2 + 2*3 
v 2*2 + 2*3 i 
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It follows that xi = 0, and 2x2 + 2 x 3 = 0 or X 3 = —X 2 . 


eigenvector is 



Therefore, the second 


with X 2 arbitrary. To normalize the eigenvector, we divide it by its length . 4 This 
amounts to choosing X 2 = l/\/2 (see Problem 7.7). We thus have 



For the third eigenvector, we substitute A 3 = 3 in Equation (7.19) to obtain 

/—2 0 0 \ /xA / 0 \ / - 2 xi \ / 0 \ 

0 -2 2 x 2 = 0 or - 2 x 2 + 2 x 3 = 0 

\0 2 -2j \x 3 J \0j \ 2 x 2 - 2 x 3 / \ 0 / 

or xi = 0, and X3 = X2. Therefore, the third eigenvector is 



with X 2 arbitrary. To normalize the eigenvector, we divide it by its length and get 



The unit eigenvectors ei, 62, and (S 3 of the preceding example are mutually 
perpendicular as the reader may easily verify. This is no accident! The matrix 
of that example happens to be symmetric, and for such matrices, we have the 
following general property: 


Box 7.4.1. Eigenvectors of a symmetric matrix corresponding to different 
eigenvalues are orthogonal. 


To show this, let x and y be eigenvectors of a symmetric matrix T correspond¬ 
ing to eigenvalues A and A', respectively: 

Tx = Ax, Ty = X'y. 

Multiply both sides of the first equation by y and the second by x to get 

yTx = Ayx, xTy = A'xy. (7.20) 

4 Here we are assuming that the inner product for the calculation of length is the usual 
Euclidean one. 
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Now take the transpose of both sides of the first equation in (7.20). This 
gives 5 

(yTx)‘ = A(yx)‘ => xTy=Axy. 

But double transposing y gives back y. Furthermore, T = T, because T is 
symmetric. So, 

xTy = Axy. 

Subtracting both sides of this equation from those of the second equation in 
(7.20), we obtain 

0 = (A - A'Jxy. 

By assumption, A ^ A'; so, we must have xy = 0, i.e., that x and y are 
orthogonal. 

7.5 Orthogonal Polynomials 

The last section generalized the two- and three-dimensional “arrows” and 
polynomials to higher dimensions in which many of the original properties of 
vectors—such as the inner product—were retained. In this section, we want to 
make two more generalizations which are necessary for many physical applica¬ 
tions. The first is the introduction of a weight function in the definition of 
inner product. A weight function is a function that is positive definite 6 in the 
interval (a, b) of integration of the inner product. More specifically, let p = 
p(t) and q = q(t) be polynomials in ;P n [t]. We define their inner product as 

P -q = [ p(t)q{t)w(t) dt, (7.21) 

J a 

where w(t) is a function that is never zero or negative for a < t < b, and its 
form is usually dictated by the physical application. The reader may verify 
that Equation (7.21) defines a positive definite inner product. 

The second generalization is to consider the collection of all polynomials 
of arbitrary degree. In other words, instead of confining ourselves to IP n [t] for 
some fixed n, we shall allow all polynomials without any restriction on their 
degree. Clearly, such a collection is indeed a vector space; however it does 
not have a finite basis. We denote this infinite-dimensional space by [t ], 
in which notation both the weight function and the interval of integration are 
included. 

Given any basis for T'f ^ [t], we can apply the Gram-Schmidt process on it 
to turn it into an orthonormal basis. Due to historical reasons, the normality 
is not a desirable property for the basis vectors. So, one seeks polynomials 
that are orthogonal, but not necessarily of unit length. Instead of normalizing 
the vectors, one standardizes them. Standardization is a rule—dictated by 
tradition—that fixes some of the coefficients of the polynomials. The proce¬ 
dure for finding these orthogonal polynomials is to start from the constant 

5 Recall that~and t mean the same thing. 

6 This just means that the function is positive and never zero. 
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polynomial 
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defined 


polynomial (of degree zero) and standardize it to get the first polynomial. 
Next apply the standardization to the polynomial of degree one (with two 
unknown coefficients), and make sure that it is perpendicular to the first 
polynomial, where the inner product is defined by (7.21). These two require¬ 
ments (standardization and perpendicularity) provide two equations and two 
unknowns which can be solved to find the coefficients of the second polyno¬ 
mial. The next polynomial has degree two with three unknown coefficients. 
Standardization and orthogonality to the first two polynomials provide three 
equations in three unknowns, the solution of which equations determines the 
third polynomial. This process can be continued indefinitely determining the 
coefficients of orthogonal polynomials up to any desired degree. 

Example 7.5.1. The procedure above is best illustrated by a concrete example. 
The Legendre polynomial of degree n, denoted by P n (t), is characterized by 
the standardization P n (l) = 1. We denote the collection of these polynomials by 
CP(_i !)[f], indicating that the interval of integration for them is from —1 to +1 
and that the weight function is unity. Because of standardization, we must choose 
Po(t) = 1. The first degree polynomial is generally written as P\(t) = ao + ait. 
Standardization gives ao + au = 1. Orthogonality to Po(t) gives 

0 = J Po{t)Pi(t)w(t) dt = J 1 • (ao + ait) ■ 1 dt = 2ao- 

So, ao = 0 and ai = 1. Therefore, Pi(t) = t. 

For P 2 (f) = ao + ait + 021 2 we have (reader please verify!) 

ao + ai + 02 = 1 (by standardization), 

2ao + 0 • ai + |a2 = 0 (by orthogonality to Po), 

0 • ao + | • ai + 0 • 02 = 0 (by orthogonality to Pi). 

The solution to these equations is ao = — ai = 0, and 02 = — §, so that Pj(t) = 

i(3f 2 — 1). Other Legendre polynomials can be found analogously. ■ 

By their very construction, orthogonal polynomials, which are denoted by 
F n (t), satisfy the following orthogonality condition: 

f (7.22) 

J a I h n if m = n, 


where h n is just a positive number (depending on n, of course) which is 
different for different types of F n J As before, let us treat these polynomials 
as vectors and write F„ for F n (t). Then using the Kronecker delta of (7.9), 
Equation (7.22) can be written as 


10 if m/n 

n *- m — A 7 . r — ' l n KJ mn • 

I h n if m = n 


7 There are many different types of orthogonal polynomials, distinguished from each other 
by different intervals, and different w(t). Different symbols—such as P n (t), H n (t), T n (t), 
etc., are used for different types. We have used F n (t) to represent any one of these types 
in our general discussion. 
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In particular, F„ • F„ = h n or |F„| 2 = h n . So, the “length” of F n is y r hn- 
Now consider the set of all functions defined in the interval (a, b) any two 
of which give a finite result when integrated as in Equation (7.22). The reader 
may easily verify that this set is indeed a vector space. If f = /(f) and g = g(t) 
are two vectors in this space, then we define their inner product as 

f g= f f{t)g{t)w(t)dt. (7.23) 


It is clear that the F„ belong to this space. Furthermore, it can be shown 
that they form a convenient basis for the vector space. In fact, any function of 
the space can be written as a (infinite) linear combination of the orthogonal 
polynomials 

OO 

f — y \ v, n F n , 

n —0 

whose coefficients can be determined by taking the inner product of both sides 
with F m : 


the set of all 
functions (not just 
polynomials) is 
also a vector 
space. 


( OO \ OO 

^ ^ tt n F n I • F m = ^ ^ o. n Fn • F m = fl m Fm * F m = hfYiCim 

n =0 / n —0 

because in the last infinite sum all the terms are zero except one. We can 
solve this equation for a m to obtain = f • F m //i m . Thus, 


f = dr. 

n =0 


where 


f • F„ 


(7.24) 


In terms of functions and polynomials, we have the important result: 

Theorem 7.5.2. A function f(t), defined in the interval ( a,b ), can be repre¬ 
sented as an infinite sum in orthogonal polynomials given by 

r b 


f(t) = ^ a n F n (t), where 


1 


f(t)F n (t)w(t) dt. (7.25) 


n—0 


There are a number of so-called classical orthogonal polynomials used 
in mathematical physics a number of whose properties we simply cite here. 
We have already mentioned Legendre polynomials for which the interval is 
(—1,+1) and w(x) = l. 8 For Legendre polynomials, h n = 2/(2 n+ 2), i.e., 


expansion of 
functions in terms 
of orthogonal 
polynomials 


classical 

orthogonal 

polynomials 



Pn{t)P m {t) dt 


if m ^ n 


2 . 2n +1 

if m = n 


hirin' 


. 2n + 1 


(7.26) 


If the interval is (—oo, oo) and w(t ) = e~* , then the resulting polynomials, 
denoted by H n {t ), are called Hermite polynomials. For Hermite polynomi¬ 
als, we have 

8 A detailed discussion of Legendre polynomials and their origin can be found in Chapter 

26 . 


Hermite 

polynomials 
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Laguerre 

polynomials 


H n (t)H m (t)e * dt 


{ 0 if m / n 

= 2 n n\ S mn . (7.27) 

V / 7t 2 "n! if m = n 


If the interval is (0, oo) and w(t) = f m e -t with m a positive integer, 9 
then the resulting polynomials, denoted by L™(f), are called Laguerre poly¬ 
nomials. For Laguerre polynomials, we have 





0 

y/n (n + m)\/n\ 


,— {n + to)! 

' 'K j Vkn • 

n\ 


if k ^ n 
if k = n 


(7.28) 


There are other (classical) orthogonal polynomials which we shall not inves¬ 
tigate here. 10 


7.6 Systems of Linear Equations 


Our discussion of determinants in Section 6.3 started with a system of two 
linear equations in two unknowns and led to the result that if the determinant 
of the matrix of coefficients is nonzero, then the inverse of this matrix exists, 
and the unknowns can be found conveniently using this inverse [see Equation 
(6.52) and Theorem 6.3.1]. This was further generalized to the case of three 
linear equations in three unknowns and stated in Equation (6.59). A system 
of n linear equations in n unknowns can be handled in the same way. We 
write such a system as 


m linear equations 
in n unknowns 


( 


^Ull 

Ol2 

• OlrA 


(h\ 

X2 

— 

021 

«22 

&2 n 


^2 

\X n ) 


\Onl 

On2 

• ®nn / 


vw 

.t, if detA ^ 0, 

we can 

calculate A 1 


x = Ab 


(7.29) 


according to Box 7.3.1, 
and multiply both sides of (7.29) by this inverse and obtain x = A _1 b. The 
case of the vanishing determinant is best treated in the context of a system 
of equations for which the number of unknowns is not equal to the number of 
equations. 

The process that led to Equations (6.49) and (6.51) is called elimination, 
and can be extended to to linear equations in n unknowns of the form 


9 Actually m need not be an integer. However, the space and scope of this book does 
not permit us to consider the general case. 

1 °The interested reader may find Hassani, S. Mathematical Physics: A Modern Intro¬ 
duction to Its Foundations , Springer-Verlag, 1999, Chapter 7, a useful reference for all 
orthogonal polynomials including many derivations and proofs that we have skipped here. 
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anXi + a 12 x 2 -l -h a ln x n = b lt 

a 2 iXi + a 2 2 X 2 -\ -1- a 2n Xn = b 2 , 


G"mlX\ T a rn2 X 2 ~\~ ' ' ' T ^mn x n — b m . 


(7.30) 


We will now describe a general process known as Gauss elimination, 
for finding all solutions of the given system of linear equations. The idea 
is to replace the given system by a simpler system, which is equivalent to 
the original system in the sense that it has precisely the same solutions. For 
example, the degenerate equation 

0 • X\ + 0 • x 2 T • • • + 0 • x n = bj 

is equivalent to 0 = bj , which cannot be satisfied unless bj is zero. 

In a more compact notation, we write only the zth equation, indicating 
its form by a sample term aijXj and the statement that the equation is to be 
summed over j from 1 to n by writing 11 

n 

a^Xj = bt for i = l,2, ...,m. (7.31) 

i=i 

We distinguish two cases: 

1. Every an = 0, i.e., all coefficients of the unknown x\ vanish. Then, triv¬ 
ially, the system (7.31) is equivalent to a smaller system of m equations 
in the n — 1 unknowns x 2l ... ,x n with x\ arbitrary for any solution of 
the smaller system. 

2. Some an yb 0. By interchanging the first equation with another if nec¬ 
essary, we get an equivalent system with an y^ 0. Dividing the first 
equation by an, we then get an equivalent system in which an = 1. 
Then subtracting an times the new first equation from each zth equa¬ 
tion for i = 2,..., rn, we get an equivalent system of the form 

Xi + a' l 2 x 2 + a'^XsA -1- a ln x n = b ' l7 

a' 22 x 2 + a' 23 x 3 -|-b a 2 n x n = b 2 , 


: (7.32) 

a 'm2 X 2 + a 'm3 X 3-\ -+ a'mn X n = b' m - 

Now we apply the same procedure to the system of equations in (7.32) in¬ 
volving only x 2 through x n so that x 2 will appear only in the first of these 
equations. If case 2 always arises, the given system is said to be compatible. 
If case 1 arises once in a while, then we may get degenerate equations of the 
form 0 = dk- If all c4 turn out to be zero, these can be ignored; if one dk yf 0, 
the original system (7.30) is incompatible (has no solutions). We summarize 
these findings as 

11 The reader may find an adequate discussion of summations and “dummy” indices in 
Section 9.2. 


Gauss elimination 
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incompatible 
systems of linear 
equations 
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echelon form of a 
system of linear 
equations 


Theorem 7.6.1. Any system (7.30) of m linear equations in n unknowns can 
be reduced to an equivalent system of r linear equations whose ith equation has 
the form 

Xi T T Ci,i^-2‘Vi J r 2 T * * * T Ci n X n — di (7.33) 

plus m — r equations of the form 0 = <4. 

Written out in full, Equation (7.33) looks like 

Xl + C 12 X 2 + C13X3 + C14X4H-h ClnXn = d\, 

X 2 + C23X3 + C24X4 • • • + C 2n Xn = d 2 , 

X3 + C34X4- 1 -C 3 n x n =d 3 , ( 7 . 34 ) 

x r + • • • + c rn x n = d r (r < to), 

which is said to be in echelon form. 

Solutions of any system of the echelon form (7.34) are easily described. 
Consider the succession of the unknowns starting with x n and going down to 
X\. If a given Xi appears as the first variable in an equation of (7.34), then it 
can be written in terms of all preceding unknowns: 12 

X'l — di Ci'i^-iXi^-i Cjy_|-2Xi-|_2 ‘ CinXn- (7.35) 

If Xi does not appear as the first variable in an equation of (7.34), then it can 
be chosen arbitrarily. We thus have 


Box 7.6.1. In the compatible case of Theorem 7.6.1, the set of all solu¬ 
tions of Equation (7.30) are determined as follows. The m — r unknowns 
Xk not occurring in (7.3f) can be chosen arbitrarily (they are free param¬ 
eters). For any choice of these Xk’s, the remaining Xi can be computed by 
substituting in (7.35). 


Example 7.6.2. Consider the following four linear equations in three unknowns 
(so to = 4 and n = 3): 

-X2 + 2x 3 = 1, 

Xl + X2 — 3x3 = 0, 

—xi + x 2 + X 3 = —2, (7.36) 

Xl + 2x2 — X3 = —1. 

The coefficient of xi in the first equation is zero. So, we switch this equation with 
one of the other equations, say the second. Then we multiply the new first equation 
by the negative of the coefficient of xi in each remaining equation and add the result 

12 If r = n, then the last equation of ( 7 . 34 ) will be x„ = d r ,. and (if the set of equations 
is compatible) all unknowns will be determined. 
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to that equation to eliminate Xi. Thus, we add the new first equation to the third 
equation of (7.36), and subtract the new first equation from the last equation of 
(7.36). The result is 


*1 + X2 — 3x 3 = 0, 

-X 2 + 2 x 3 = 1 , 

2x 2 - 2X3 = -2, (7.37) 

X2 + 2X3 = —1. 

To eliminate X 2 from the last two equations, multiply the second equation of (7.37) 
by 2 (or 1 for the last) and add it to the third (or last) equation. This will yield 

Xl + X2 — 3X3 = 0, 

-X 2 + 2x 3 = 1, 

4x 3 = o, (7.38) 

4x3 = 0. 


Multiply the second equation by —1, divide the third equation in (7.38) by 4, and 
finally subtract the result from the last equation. The final result is the following 
echelon form: 


Xl + X2 — 3X3 = 0, 

X2 — 2X3 = —1, 

x 3 = 0, (7.39) 

0 = 0 , 

which corresponds to Equation (7.34) with r = n = 3. Thus, we have one equation 
of the form 0 = dk for which dk is zero. So, the system has a solution. To find 
this solution, start with the third equation of (7.39) which gives X 3 = 0. Substitute 
in the equation above it to get X 2 = — 1 , and these values in the first equation to 
obtain xi = 1 . g 

Example 7.6.3. As another example, consider the following: 

Xl + X 2 + X 3 = 0, 

2xi — X 2 + X 3 = —2, 

—xi + 2 x 2 + X 3 = — 1, (7.40) 

xi — 2x2 + X3 = 2. 

Multiply the first equation successively by —2, 1, and —1 and add it to the second, 
third and fourth equations. The result will be 

Xl + X 2 + X 3 = 0, 

—3x2 — X3 = —2, 

3X2 + 2 x 3 = —1, 

—3x2 + 0 • X3 = 2. 
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equations 


Now divide the second equation by —3, 

*1 + *2 + *3 = 0 , 

X2 + = §, 

3*2 + 2*3 -1, (7.41) 

—3*2 + 0 • *3 = 2. 

Multiply the second equation of (7.41) successively by —3 and +3 and add it to the 
third and last equations. This will yield 

*1 + *2 + *3 = 0, 

*2 + 3*3 — 3 ? 

*3 = -3, 

*3 = 4. 

Subtract the third equation from the last to get 

*1 + *2 + *3 = 0, 

*2 4 “ 3*3 — 3 ? 

*3 = -3, 

0 = 7. 

In this case, we have an equation of the form 0 = dk for which dk = 7. So, the 
system is incompatible, i.e., it has no solution. ■ 

A system of linear equations (7.30) is homogeneous if the constants 6; 
on the RHS are all zero. Such a system always has a trivial solution with 
all the unknowns equal to zero. There may be no further solutions, but if 
the number of variables exceeds the number of equations, the last equation 
of (7.32) will always contain more than one variable at least one of which can 
be chosen at will. Furthermore, the inconsistent equations 0 = dk can never 
arise for such homogeneous equations. Hence, 


(7.42) 


(7.43) 


Box 7.6.2. A system of m homogeneous linear equations in n unknowns, 
with n > to, always has a solution in which not all the unknowns are zero. 


7.7 Problems 


7.1. Show that Equation (7.4) is a linear transformation. 

7.2. Verify that the operation of differentiation of any order is a linear trans¬ 
formation on !P„ [t]. 


7.3. Show that 

d 2 d 

L = P2{x)-j—^ +Pt(x)— +Po{x) 
is a linear operator on the space of differentiable functions. 
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7.4. Show that the coefficients in IPi [t] of the second derivative of an arbitrary 
polynomial f(t) = ao + oiit + a 2 f 2 + « 3 t 3 can be obtained by the product of 
the matrix of the second derivative obtained in Example 7.1.1, and the 4x1 
column vector representing /(<). 

7.5. Express the element in the zth row and jth column of a unit matrix in 
terms of the Kronecker delta. 


7.6. Suppose x is an eigenvector of T with eigenvalue A. Show that, for any 
constant a, ax is also an eigenvector of T with the same eigenvalue. 

7.7. Find the length of a 2 of Example 7.4.1 in terms of x 2 ■ Now show that 
a 2 /|a 2 | = e 2 . 


7.8. Show that the rotation of the plane affects all vectors in the plane. Hint: 
Try to find an eigenvector of the 2x2 rotation matrix (6.24). 

7.9. Find the eigenvalues and normalized (unit length) eigenvectors of the 
following matrices. In cases where the matrix is symmetric, verify directly 
that its eigenvectors corresponding to different eigenvalues are orthogonal. 


(a) 

(d) 





0 

°\ 

/I 

1 


1 

1 • 

(f) 1 

1 

1 

1 

1/ 

Vi 

1 

1/ 


7.10. Show that Box 7.4.1 is not necessarily true for a general inner product 
with matrix G. However, if G and T commute (i.e., if GT = TG), then Box 
7.4.1 holds. Hint: Follow the argument after Box 7.4.1 and see how far you 
can proceed. 

7.11. Show that the inner product defined in Equation (7.21) is indeed a 
positive definite inner product. 

7.12. Find the fourth Legendre polynomial using the results of Example 7.5.1. 

7.13. Find the first three Hermite polynomials using the standardization (or 
normalization) Equation (7.27). 

7.14. The volume element of a four-dimensional Euclidean space with Carte¬ 
sian coordinates x, y, z, and w is dxdydzdw. In any other coordinate system, 
it is given by a 4-dimensional generalization of the Jacobian (6.66) 

(a) Write this Jacobian for a general transformation to coordinates s, t, u, 
and v where x, y, z, and w are functions of these new coordinates. 

(b) Now consider the 4-dimensional spherical coordinates: 


x = r sin y sin 9 cos ip 
y = r sin y sin 9 sin <p 
z = r sin y cos 9 
w = r cos y 
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and calculate the 4-dimensional Jacobian to find the volume element of a 4- 
dimensional sphere. 

(c) With 0<(p<27r,0<6 | <7r, 0 < y < tt, find the volume of a 4-sphere of 
radius a. 


7.15. Determine the r of Equation (7.34) for each of the following systems of 
linear equations and whether or not the system is compatible. If the system 
is compatible, find a solution for it. 


2x — y — Az = 1, 
(a) x + 2y + 2z = 0, 

—x — y + 6z = 3. 

x + y + z = 2, 

(c) 2x — y + 2z = — 2, 
3x + y — z = 4. 

3x + 2y = 7, 
x + y + z = 6, 
5x + 4y + 2z = 19, 
x — 2y = —5. 


x + y + z = —1, 
(b) 2x — y + 2z = — 5, 
3x + 3y + z = 1. 

2x + y — 2z = 2, 
(d) 3x — y — Az =—1, 
3x + Ay — 2z = 7. 

x + 5y — z = 2, 
2x + y + 3z = -1, 
—a: + 3y + 2z = —3, 
3a; + 2y — z = 4. 




Chapter 8 

Vectors in Relativity 


One of the most rewarding applications of vectors is to relativity. The special 
theory of relativity (STR) was a direct consequence of Maxwell’s equations, 
which summarize the entire theory of electromagnetism (see Section 15.4). 
These equations predict mathematically that there must exist electromagnetic 
(EM) waves which travel at the speed of light in empty space. This speed c is 
found in terms of purely electric and magnetic measurements: 


1 

c =- 

VMoeo 


- = 2.998 x 10 8 

\J (4-7T x 10- 7 ) (8.854 x 10- 12 ) 


m/s, 


where eo = l/47tfc e and no = 47r k m , with k e and k m the electric and magnetic 
constants introduced in Chapter 1. 

Imagine two laboratories on two spaceships, Si and S 2 , with S 1 behind 
(and moving towards) S 2 at 0.9c relative to S 2 . The physicists on Si perform 
electric and magnetic experiments, measure eo and /jq , and conclude that 
EM waves travel at 300,000 km/s in empty space. The physicists on S 2 
also perform electric and magnetic experiments, measure eo and /xo, and also 
conclude that EM waves travel at 300,000 km/s in empty space. Now a 
physicist on Si takes a flashlight and sends a beam of light in the forward 
direction in empty space. The consequence of Maxwell’s equations is that the 
physicists on S 2 , although seeing S 1 moving towards them at 0.9c and the 
light beam moving away from Si at c, conclude that the speed of the light 
beam is c and not 1.9c, as expected from the Newtonian law of addition of 
velocities. 

To appreciate the strange consequence of Maxwell’s equations, consider 
the following example: A train moving at 30 m/s and a passenger throwing 
a ball in the forward direction with a speed of 20 m/s. A ground observer 
measures the speed of the ball to be 30 + 20 = 50 m/s: velocities add. Here is 
another familiar example: A car moves at 75 mph on a highway on which your 
car is moving at 50 mph. The speed of the fast car relative to you is 25 mph. 
You speed up to 70 mph. Then the other car appears to have “slowed down,” 
because, now you measure its speed relative to you to be only 5 mph. Go to 


law of addition 
of velocities 
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the 

Michelson 

clock 


outer space, let someone in your spaceship fire a bullet moving at 500 mph. 
Increase your speed to 450 mph, the bullet appears to be moving at 50 mph 
away from you. Increase your speed by another 100 mph. You catch up with 
the bullet, and if you decrease your speed by 50 mph, the bullet appears 
stationary relative to you. 

Now shoot a beam of light forward, and once the beam leaves your flash¬ 
light, accelerate your spaceship to a speed of 299,000 km/s. Measure the 
speed of the light beam. It is still 300,000 km/s, and not 1000 km/s, as in¬ 
tuitively expected! Maxwell’s equations defy intuition, and the (STR), which 
is entirely based on these equations is extremely counter-intuitive. Let us 
summarize these observations: 


Box 8.0.1. (Principle of Relativity) Every time you detect an electro¬ 
magnetic wave, it moves at the rate of 300,000 km per second in vacuum, 
regardless of the motion of its source or its detector. Speed of light in 
vacuum is a universal constant. 


An immediate consequence of the principle of relativity is the fact that time 
is observer-dependent. As Einstein said “Time is something that is measured 
by clocks.” So, let us look at the effect of motion on clocks. The clock best 
suited for this investigation is the “arm” of the Michelson-Morley apparatus 
shown in Figure 8.1. It consists of a source S of light, or electromagnetic 
waves, and a mirror M. The distance between S and M is L. Therefore, it 
takes light Artick = 2L/c to go from S to M and back. If we place a light 
Morley sensitive “ticker” at S, the clock will tick every Ar t i c k second. We call such 
a clock a Michelson-Morley clock, or an MM clock, and Ar t i c k the proper 
tick of the MM clock. Artick is the tick measured by an observer for whom 
the clock is at rest, or for whom the beginning and the end of a tick occur at 
the same location. 


M 



Figure 8.1: A Michelson-Morley clock. A “tick” of this clock occurs when the light 
signal makes a round trip along the length L. 
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8.1 Proper and Coordinate Time 

An MM clock is placed on a train and observed by two observers, 0 (on the 
ground) and O' (on the train) moving to the right of O. Consider three events: 
The emission of a light beam at S, its reflection at M, and its reception at 
S. These three events constitute one tick. Let us denote them by E\, E 2 , 
and £ 3 , respectively. How does O' see the ticking of the clock? The clock 
is sitting right beside her, and she observes the whole process of ticking as 
the light going straight up and coming straight down. She concludes that her 
clock’s ticks are Ar t i c k long. 

Now, let us see how O perceives the succession of these three events. Since 
the clock is moving to the right, the light signal that leaves S will reach M 
only after M has moved to the right. Thus, to O , the events Ei and E 2 are 
separated not only by a vertical distance, but also by a horizontal distance (see 
Figure 8.2). Since the speed of light is the same for all observers , O concludes 
that it takes light more than 2 L/c to travel E\E 2 and E 2 E^. Therefore, he 
concludes that the clock on the train must tick slower! 

We can quantify the above statement by referring to the triangle EiAE 2 
of Figure 8.2. Pythagoras’ theorem implies 

(EfE^) 2 = (EiA) 2 + (' AE2) 2 ■ 

Let the speed of the train be v and the light beam’s travel time from S to M 
be 6t according to O. Then E\A = vSt and E\E 2 = cSt with c the (universal) 
speed of light. Putting all of this in the above equation gives 

(cSt) 2 = (vSt) 2 + L 2 => c 2 (St) 2 = v 2 (St) 2 + L 2 , (8.1) 


or 



I_A_ 

0 5 


(St) 2 1 








I I 


j i 


) c 


] c 




Figure 8.2: A moving Michelson-Morley clock. The path of light (represented by a 
black dot) is not a vertical line but a slanted one due to the motion of M. 


moving clocks 
slow down. 
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motion does not 
affect transverse 
lengths. 


relation between 
proper time and 
coordinate time 


spacetime 

introduced 


This yields 


(St) 2 


L 2 /c 2 
1 — V 2 / c 2 


St = 


L/c 

y/1 -(v/c) 2 ' 


Let us denote by At t ick the duration of the light’s round trip as seen by O. 
Then 


Aftick 


2 L/c 

V^w 


Artick 

Vl-(v/c) 2 ' 


( 8 . 2 ) 


In deriving this equation, we have tacitly assumed that motion does not affect 
transverse lengths. Thus the length of the MM clock does not change because it 
is perpendicular to the direction of motion. To see this, consider the distance 
between two wheels of a train, and suppose that this distance shrinks 1 due to 
its motion as seen by a ground observer. This means that the wheels will fall 
between the rails. On the other hand, the engineer of the train sees the rail 
moving and concludes that the distance between the rails shrink; i.e., that the 
wheels fall outside the rails. This contradicts the previous conclusion. Thus, 
the length perpendicular to the direction of motion must not change. 

Although Equation (8.2) is derived for a single tick, it really applies to all 
time intervals, because any such interval is a multiple of a single tick. We now 
rewrite Equation (8.2) without the subscript “tick,” realizing that At is the 
proper time between any two events, i.e., the time interval between the two 
events measured by a clock that is present at both events: 


At = 


At 

VI -(v/c) 2 


(8.3) 


Ar can also be defined as the time measured by an observer for whom the two 
events occur at the same spatial point. At, called the coordinate time, is 
the time measured by another observer, moving relative to the first one with 
speed v, for whom the two events occur at two different spatial points. 


8.2 Spacetime Distance 

The most elegant way of relating an event’s space and time properties as 
described by two observers is to use geometry. We start with the description 
of the event itself. An event has a position and an instant of time. Therefore, 
it can be represented by a set of four coordinates: three for position and 
one for time. It is common to multiply the time t by c (to make a distance 
out of it) and put it as the first coordinate. Thus in Cartesian coordinate 
system, an event is described by (ct,x,y, z). Geometrically, we have added 
the extra “dimension” of time to the three-dimensional space to create the 
four-dimensional spacetime. 

At the heart of any geometry is the distance between two nearby points, 
and how it is written in terms of the coordinates of the points. Euclidean 

1 The same argument applies to the case where the distance expands. 
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geometry started without coordinates, with the notion of the distance be¬ 
tween two points being “evident.” In fact, we use the properties of Euclidean 
distance (such as the Pythagoras’ theorem involving three distances corre¬ 
sponding to the three sides of a right triangle) to show that the distance 
between two points whose Cartesian coordinates differ by (Ax, Ay, A z) is 
-\/(Ax) 2 + (Ay) 2 + (Az) 2 . 

In the case of the spacetime geometry, we have started with coordinates. 
Now we have to find a distance formula in terms of the difference between 
coordinates of two events. We get some clues from Euclidean distance as 
expressed in terms of coordinates. The first clue is that distance is observer- 
independent: If observer O uses his Cartesian coordinate system to label point 
Pi by (x 1 , yi, z 1 ) and P 2 by (x 2 , y 2 , 22), and finds 


(PiP 2 )o = Ar = \J (x 2 - xi) 2 + (y 2 - yi) 2 + (z 2 - zi) 2 , 

and if observer O' uses her Cartesian coordinate system to label point Pi by 
(x'i,y'i,z[) and P 2 by (x 2 ,y 2 ,4)> and finds 

(PiP2)o' = A r' = yj ( x' 2 - x'i ) 2 + (y ' 2 - yi ) 2 + (z ' 2 - z^) 2 , 

then Ar' = Ar. The second clue is that if Pi and P 2 lie along a single axis of 
an observer, then the distance is the (absolute value of the) difference between 
the coordinates of Pi and P 2 . 

Now consider two events E\ and P 2 , which occur at the same spatial 
location according to O', with E 2 happening after E\. This means that O' 
(his clock) is present at both events, i.e., that E\ and E 2 lie along the time 
axis of O', and that O' is measuring the proper time interval between the 
two events: At = t 2 — t[. By the second clue above, cAr = c(t' 2 — t[) is 
the distance we are looking for (again we multiply by c to make a distance 
out of it). We introduce the notation As = cAr and call As the spacetime 
distance or the invariant interval between the two events. 

Another observer O assigns spacetime coordinates (cti, Xi, yi, Zi) to E\ 
and (cf 2 , x 2 , y 2 , z 2 ) to E 2 . Now the spatial separation between E\ and E 2 
according to O is 


\/{x 2 - xi) 2 + (y 2 - yi) 2 + (z 2 - 21 ) 2 , 

and since O' is at E\ when it happens and at E 2 when it happens, this 
equation is precisely the distance that O' travels in time t 2 — t\ with respect 
to O. Therefore, the speed of O' relative to O is 

_ \J{x 2 ~ X!) 2 + (y 2 - yi) 2 + (z 2 - ^J 2 


or 

2 _ (X2 ~ Xi ) 2 + (y 2 - yi) 2 + (z 2 - Zx ) 2 
(t 2 -t 1) 2 


geometry and 
distance formula 
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Up to this point, we have not used any physics (except for the definition 
of speed). Now comes the crucial final step. Equation (8.3) (which is a direct 
result of Box 8.0.1) can now be used to find the expression of As in terms of 
coordinate differences. Equation (8.3) implies that 

At cAr A s 

C ^ ^1-{V/CY = y/l-(l ,/c) 2 ' 
or 


As = cAfi/l — ( v/c ) 2 = c(t '2 — ti)\/l — v 2 /c 2 

= Vc 2 (t2-hr-vi(t 2 -h)2. 


Substituting the expression for v 2 above, we get 

As = \]c 2 {t 2 - ti ) 2 - ( x 2 - xi) 2 - (y 2 - yi) 2 - (z 2 - 2 i) 2 . 

We rewrite this important formula as 

(As ) 2 = (cAr ) 2 = c 2 (At ) 2 - (A *) 2 - (Ay ) 2 - (A z) 2 . (8.4) 


Let’s emphasize the significance of this equation: If observer O uses his 
Cartesian coordinate system to label event E\ by (cfi, aq, y\, Zi) and E 2 by 
(ct 2 ,x 2: y 2 , z 2 ), and finds 

(As ) 2 = c 2 (f 2 - fi ) 2 - (x 2 - aq ) 2 - (y 2 - yi ) 2 - (z 2 - Zi) 2 , 


and if observer O' uses her Cartesian coordinate system to label event Ei by 
(ct[, x' l7 y[, z[) and E 2 by (ct' 2 , x' 2 , y’ 2 ' z 2 ), and finds 


(As ') 2 = c 2 (t' - t') 2 - (x' 2 x[) 2 - (y' - y[) 2 (z' 2 - z[) 2 , 


then (As ') 2 = (As) 2 . Thus, although events are coordinatizecl differently by 
different observers, the spacetime distance between two events is universal. 
In contrast to Newtonian physics, neither the time interval nor the spatial 
distance between two events is universal in relativity. 

Example 8.2.1. Observer O spots a light beam (event Ei) at (* 1 , 3 / 1 , 21 ) at time 
ti. A little later he finds the beam (event E 2 ) at (* 2 , 3 / 2 , 22 ) at time f 2 . What is the 
spacetime interval for this light beam (i.e., for the two events E\ and E 2 )? 

Since light travels from (* 1 , 3 / 1 , 21 ) to (* 2 , 3 / 2 , 22 ) with speed c, we have 

sj (*2 - *i) 2 + (3/2 - 3/1) 2 + (22 - 21) 2 = c(t 2 - ti). 

Therefore, 

(As) 2 = c 2 (t 2 - ti ) 2 - (*2 - *i ) 2 - ( 3/2 - 3 /i ) 2 - (22 - 21) 2 = 0, 


which holds for any light signal, as the two events above are quite general. Thus the 
spacetime distance between two different events which can be connected by a light 
signal is zero. This is in contrast to the Euclidean case where two different points 
always have a nonzero distance between them. ■ 
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8.3 Lorentz Transformation 


Because of the intuitiveness of the concept of distance in Euclidean geometry, 
it is not essential to know how the coordinates of a point in one coordinate 
system (CS) are related to the coordinates of that same point in another 
CS. This transformation was found long after the maturity of the Euclidean 
geometry [see Section 6.1.3 and especially Equation (6.22) for a discussion of 
the two-dimensional version of coordinate transformation], and it was based 
entirely on the expression for the distance between two points in terms of the 
coordinates of those points. 

In spacetime geometry such a transformation is indispensable due to the 
counter-intuitive properties of the invariant interval (see Example 8.2.1 above). 
And while in Euclidean geometry, one can picture different coordinate systems 
and how they relate to one another (see Figure 6.7, for example), spacetime 
geometry does not readily allow such a direct pictorial representation without 
some preliminary algebraic discussion. 

Let ri = (cti, x\, yi, Z\) and r 2 = (ct 2 , a; 2 , 2 / 2 , 2 2 ) be the spacetime “po¬ 
sition vectors” of two events Ei and E 2 relative to a coordinate system O. 
Construct the difference 


Ar = r 2 - ri = ( ct 2 - cti,x 2 - £ 1 , 2/2 - 2 / 1,22 - 21 ), 

and define the square of the “length” of this vector to be (A.s) 2 . In fact, 
this is generalized for any four-dimensional vector. But first, let’s introduce a 
notation. 

A spacetime vector has the form a = (ao, aq, a 2 , 03 ), which is usually 
called a four-vector or a 4-vector. 2 It is also denoted by (ao,a) where 
a = (a 1 , 02 , 03 ) is the space part (or the 3-vector part) of the 4-vector. A pri¬ 
mary example of a four-vector is r = ( ct , x, y, 2 ) = ( ct , r). The generalization 
mentioned above defines the square of the length of a (or the inner product 
of a with itself) as 

a • a = Qq — a\ — a\ — a\ = — a ■ a = — |o| 2 . (8.5) 


Then it is easy (see Problem 8.1) to show that the inner product of any two 
vectors must be given by 


a • b = aobo — aq&i — a 2 &2 — a 3 b 3 = ao^o — a • b. 

In matrix form this can be written as 

fbo\ 

bi 

b 2 ’ 

W 


b = (a 0 ai a 2 a 3 ) 


/I 0 0 0 \ 

0-10 0 
0 0-10 
\0 0 0 - 1 / 


( 8 . 6 ) 


(8.7) 


2 Note that the first component of a has zero as an index, and is called the time compo¬ 
nent. This is common in relativity. 


four-vectors 
introduced 
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general Lorentz 
transformation 


or 


a • b = apb where rj = 


/I 0 0 0 ^ 

0-10 0 
0 0-10 
\0 0 0 - 1 / 


( 8 . 8 ) 


and a and b are the row and column vectors in Equation (8.7). 

A linear transformation that leaves the inner product of Equation ( 8 . 8 )— 
and therefore the spacetime length As—invariant is called a Lorentz trans¬ 
formation. By Equation (7.11), such a transformation A—which is a 4 x 4 
matrix—satisfies 


A?/A = rj. (8.9) 

The study of the general structure of Lorentz transformations is beyond 
the scope of this book. Here we shall confine ourselves to the Lorentz trans¬ 
formations in two dimensions, in which the third and fourth components of 
vectors are ignored. This means that vectors are of the form a = (ao,ai), 
b = (bo,bi), the inner product is of the form a b = aobo — aibi, and the 
matrix 77 reduces to 



In addition, the Lorentz transformations become 2x2 matrices. 

Let A = f ° n ° 12 ] be a two-dimensional Lorentz transformation that 

\021 022 / 

acts on 2-vectors in O to give the corresponding 2-vectors in O' . Then A must 
satisfy Equation (8.9) or 

(°ii °2i\ A 0 \ fan ai2\ _/l 0 \ 

Vai2 022 / \0 1 J ya 21 a 2 J Vo -1 )’ (8 - 10) 

which is equivalent to the following three equations [see ( 6 . 21 ) for a guide]: 

OO OO / \ 

On — ®21 = 1) OnOi 2 — O 21 O 22 = 0, Oi 2 — O 22 = — 1. (8.11) 

As in the case of rotations (see Section 6.1.3), we can conclude that 

2 _ 2 2 _ 2 2 _ 2 -1 /n 1 a\ 

a 22 — a ll’ a 12 — a 21? a 12 — a ll — 1- (8.12) 


So, all parameters are once again given in terms of on. 

To determine an, consider the 2-vector (cAt, Ax), the difference between 
the time and position of two events in O. This 2-vector is represented by 
(cAT, Ax') in O', and, by the definition of the Lorentz transformations, 


/ cAt'\ _ f an ai 2 \ / cAt\ 

\Ax') ~ Va 21 022 / \Ax/ ■ 


(8.13) 


Now suppose that Ax = 0, i.e., that the two events occur at the same location. 
Then O is measuring the proper time, so that At = Ar. From Equation (8.13), 
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we also have cAf' = ancAf or A t' = an At. Comparison with Equation (8.3) 
yields 

1 

on — — . = ■ 

Introducing the two symbols (3 = v/c and 7 = l/y/l — (v/c) 2 , we obtain 


Oil 


1 


(8.14) 


The rest of the matrix elements can now be found. The first equation in 
(8.12) gives 022 = ± 7 - To choose the correct sign for 022 , note that if O and 
O' are not moving relative to one another, the coordinates do not change. 
Therefore A must be the unit matrix. So, 022 = 1 when v = 0. This can 
happen only if 022 = + 7 - The second equation in (8.12) now gives 012 = 021 ; 
and the third equation yields 


°12 


= 7 2 - 1 = 


1 - f3 2 


- 1 = 


P 2 


2 _,2 


1 — /5 s 


= Pi 


012 = ±/?7- 


The ambiguity in the sign comes from the choice we have for the direction of 
motion. We absorb this choice of sign in /?, and write 


A = 




(8.15) 


For the important case of spacetime “position” vector (ct,x), this yields 

ct' = 7 (ct + fix ), 

x' = j(x + Pet). (8.16) 


Lorentz 

transformation in 
two spacetime 
dimensions 


P is positive (negative) when observer O —who uses (ct, x) for events—travels 
in the positive (negative) direction of O '—who uses primed coordinates. Equa¬ 
tion (8.16) displays the celebrated Lorentz transformations in two spacetime 
dimensions. 


Example 8.3.1. Emmy (observer O ) is riding a train and she is standing in the 
middle of one of the cars of length L at the two ends of which are two firecrackers that 
explode simultaneously. Karl (observer O') is standing on the platform watching 
Emmy go by with speed (3. Time zero for both coincides with the moment that Emmy 
passes by Karl. Suppose that the simultaneous explosion of the two forecrackers 
(according to Emmy) also takes place at t = 0. We want to see how all this appears 
to Karl. 

Assume that Emmy and Karl are located at their respective origins. Let the front 
firecracker be labeled as 1 and the back as 2. Then the front and back firecrackers 
have coordinates (0, L/2) and (0, — L/2), respectively, in Emmy’s RF. Karl, on the 
other hand, measures the coordinates of the firecrackers as 


ct( = 7/3L/2, x[ = 7L/2 ct' 2 = i(-f3L/2), x' 2 = ^(-L/2) 
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from Equation (8.16). This shows that, for Karl, the back firecracker occurs first. 
In fact, it occurs before Emmy reaches him (at time t' = 0). The time difference 
between the two events is 


At' = t'i — t’ 2 = -fpL/c. 


Take L to be 30 m. Then, for the time difference to be a mere one second, we must 
have 


307/3 = 3 x 10 8 or 1 = 10 7 , 


giving /3 = 0.999999999999995, awfully close to the speed of light! 

On the other hand, if L is a typical interstellar distance of say 10 light years, 
then 


7/3 = 


At' 

10 


with At' measured in years. For a time difference of one hour, we have 7/3 = 
1.14 x 10 —5 , yielding /3 = 1.14 x 1CU 5 , or v = 3425 m/s, an easily attainable 
speed. _ 


Example 8.3.2. Observer O moves in the positive space direction of observer O' 
at speed v (or j3 = v/c). A particle moves at speed f3 p in the positive space direction 
of O. What is (3 P , the speed of the particle relative to O'l 

The definition of speed is distance between two events divided by time interval 
between those events: spotting of the particle at a point in space and an instant in 
time (first event), and spotting the particle at a nearby point a little later (second 
event). For example, observer O assigns the coordinates ( ct,x ) to the first event 
and ( ct + cAt, x + Ax) to the second event, and concludes that the (dimensionless) 
speed of the particle is (3 P = Ax/(cAt). 

Similarly, observer O' assigns the coordinates ( ct',x ') to the first event and 
(ct' + cAt',x 1 + Ax') to the second event, and concludes that the speed of the 
particle is f3' p = Ax'/(cAt'), where Ax' and cAt' are related to Ax and cAt. via the 
Lorentz transformation. Using Equation (8.16), we find 


, _ Ax' _ 7 (Ax + (3cAt ) 
p cAt' 7 (cAf + (3Ax) ’ 


relativistic law of 
addition of 
velocities 


dividing the numerator and denominator by cAt, we get 

o’ _ ft P + (3 
Pp - 1 + /3/3 P ’ 


(8.17) 


which is called the relativistic law of addition of velocities. 

One can show that if 0 < /3 P < 1 and 0 < /3 < 1, then 0 < f3 p < 1. So, it is 
impossible to add two velocities close to light speed and get a velocity larger than 
light speed. Furthermore, if the particle happens to be a photon (or a light beam), 
then /3 P = 1 and 


& = 


1 + /3 
1 + /3 


= 1 . 


verifying the universality of the speed of light, the starting point of relativity 
theory! _ 
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In many situations, an observer in three dimensions moves along the x- 
axis. Then, the y and 2 coordinates of events—being perpendicular to the 
direction of motion—do not change. This suggests a slightly more general 
Lorentz transformation than (8.16): 

ct' = 7 (ct + (3x ), 
x' = y (x + Pet), 

y' = y, (8.18) 


If an object moves in the xy-pl&ne of an observer O with a velocity whose 
components are (v x , v y ), then the same object moves in the x'y 1 -plane of 
another observer O' with a velocity whose components are 


dx' 7 (dx + Pcdt) v x + Pc 
x dt' 7 (dt + pdx/c) l + Pv x /c ’ 

_ (V _ dy _ v y 
y dt' 7 (dt + Pdx/c) 7(1 + Pv x /c) : 


(8.19) 


where p is the velocity of O relative to O'. In particular, if the object is light 
and the angle it makes with the a:-axis is a, then v x = ccosa, v y = csina, 
v x > = c cos a' and ty = csina', and the equations above yield 


cos a' 
sin a' 


cos a + P 
1 + P cos a ’ 
sin a 

7(1 + P cos a) ’ 


( 8 . 20 ) 


Now suppose that an observer O carries an EM radiation source which 
radiates uniformly in all directions. If P is very close to 1, then (8.20) im¬ 
plies that cos a' —» 1 (and of course, sin o' —> 0), regardless of a. Thus, 
an ultrarelativistic source of EM wave radiates (almost) only in the forward 
direction. 


an ultrarelativistic 
source radiates 
only in the 
forward direction. 


8.4 Four-Velocity and Four-Momentum 

In Newtonian mechanics velocity is defined as the derivative of the position 
vector with respect to time. In terms of (Cartesian) coordinates, an observer 
O locates the object in motion by assigning it the coordinates ( x,y,z ), and 
differentiates these coordinates with respect to (the universal) time t to get 
the velocity of the object: v = ( x , y, z). 

In relativity, the “position vector” is r = ( ct,x, y,z ) = (ct, r), and there is 
no universal time. However, each moving object has a proper time (measured 
by a clock carried by the object), which is universal in the sense that all 
observers measure it to be the same [see Equation (8.4) and the comments 
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4-velocity has 
constant length 


4-velocity is 
perpendicular to 
4-acceleration 


after it]. Therefore, it is natural to define the dimensionless four-velocity as 


dr 1 dr 
ds c dr 


dt 1 dx 1 dy 1 dz \ f x y z 

dr' c dr 1 c dr' c dr } l ’ c ’ c ’ c 


7 ( 1 ) v/c ), 


( 8 . 21 ) 

where a dot represents differentiation with respect to the coordinate time t, 
and we used dt = jdr [see Equation (8.3)]. 

An interesting property of the four-velocity is that its spacetime length is 
one: 


u-u = Uo~ u i -ul-ul = y 2 [1 - (v/c) • ( v/c )] = y 2 (l - v 2 /c 2 ) = 1 , ( 8 . 22 ) 

from the definition of 7 in (8.14). The four-velocity of an object in the object’s 
rest frame is (1, 0,0, 0), i.e., it is a unit vector in the time direction. If we define 
the four-acceleration as the rate of change of the four-velocity with respect 
to proper time, then the inner product of the 4-velocity and the 4-acceleration 
of any object is zero, i.e., because of (8.22), the 4-acceleration is 77 -orthogonal 
to the 4-velocity. Summarizing these two properties of the 4-velocity, we get 

uu=l, u • a = 0. (8.23) 

Example 8.4.1. A particle is moving in the two-dimentional spacetime of an 
inertial frame on a path given parametrically as 

t(o) = 6 sinh(cr), x(o) = c 6 cosh(cr), 

where cr is a dimensionless parameter. The differential of the particle’s proper time 
is 

(cdr) 2 = (cdt) 2 — (dx) 2 = ( cb) 2 cosh 2 (<j) (do) 2 — (cb ) 2 sinh 2 (cr) (do) 2 
= (cb) 2 (do) 2 => do = idr, 

and o = r/b. Thus, as a function of the proper time, the path becomes 
t(r) = 6 sinh(r/ 6 ), x(t) = c 6 cosh(r/ 6 ). 

The components of the (dimensionless) 4-velocity are 

Mo = - 7 - = cosh(r/ 6 ), Mi = = sinh(r/ 6 ), 

dr cdr 

which satisfy Uq — u 2 = 1 as they should. 

The acceleration of the particle has components 

ao = = 7 sinh(r /b), 01 = = 7 cosh(r/ 6 ). 

dr b dr b 

It is easily verified that a ■ u = 0 and that 

2 2 

cl ■ cl - CLq — CL 1 — 



So, the particle has a uniform acceleration of 1/6. The negative sign in the last 
equation is due to the fact that the magnitude of the acceleration has to be defined 
as —a ■ a = a 2 — Oq, with the space part appearing as positive (so that when ao is 
absent, we get back the Newtonian acceleration). B 
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The (kinematic) 4-velocity leads to the (dynamic) 4-momentum: just mul¬ 
tiply u by me—the c is to give dimension to the 4-velocity. In a reference 
frame in which an object of mass m moves with velocity v, the 4-momentum 
p is given by 

P = (po,Pi,Pz,P 3 ) = (po,P) = men = 'ymc (1, v/c) = ('ymc^mv ). (8.24) 


The space part of the 4-momentum is 


p = 7 mv = 


mv 


(8.25) 


and gives ordinary Newtonian momentum when w| << c, because in that 
limit, 7 « 1. Therefore, we call p the relativistic momentum. 

What about po? How are we to interpret that? If we set 7 « 1, we get 
Po ss me which does not correspond to any Newtonian quantity. However, if 
we make the next best approximation to 7 (see Example 10.2.1 and Problem 
10 . 8 ), i.e., 


1 

-{v/cY 


1 + \{v/c)\ 


then 

/ 1 q . o\ n 1 7 

Po = me 7 « me (1 + 5 v /c ) =>■ poc ss me + ^mv . 

The second term gives us the clue that poc must be the relativistic energy 
E. So we write 


P = (Po,P) = ( E/c,p) = ('jmc, ’jmv), E = 7 me 2 


me 2 

(v/c) 2 ' 


(8.26) 


An important special case of this is the 4-momentum p of a particle in its rest 
frame: 

p = (me, 0) = (me, 0,0,0). (8.27) 

The definition of the relativistic energy allows objects to have rest energy. 
when v = 0 , we get 

E = me 2 , (8.28) 

which states the equivalence of mass and energy and allows their conversion 
into one another. 

The invariance of the length of a 4-vector tells us that p • p is a quantity 
that is independent of observers. From Equation (8.26), we get 


p p = ( E/c ) 2 — \p\ 2 = 7 2 m 2 c 2 — 7 2 m 2 u 2 = 7 2 m 2 c 2 (l — v 2 /c 2 ) = m 2 c 2 


which we rewrite for future reference 


p p = m 2 c 2 or E 2 — \p\ 2 c 2 = m 2 c A . (8.29) 

3 One may interpret me as the momentum of an object moving at the speed of light. 
However, while objects moving at light speed are possible in Newtonian physics, relativity 
does not allow a massive object to go at the speed of light [see (8.25)]. 


4-momentum 

defined 


relativistic 

momentum 


relativistic energy 


the most famous 
equation in 
physics! 
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Thus, although different observers measure different values for the energy 
and 3-momentum of an object, when they subtract the square of their value 
of momentum (times c) from their corresponding value of energy squared, all 
get the same numerical value, namely the square of the mass of the object 
(time c 4 ). 

Equation (8.29) allows particles with zero mass to have energy and mo¬ 
mentum. For such particles, 

E 2 - \p\ 2 c 2 = 0 or E = \p\c. (8.30) 

Since p/E = v/c 2 [see Equation (8.26)], we conclude from (8.29) and (8.30) 
that 


Box 8.4.1. A particle is massless if and only if it moves at light speed. 


photon is 
massless! 


The particle (quantum) of electromagnetic waves is photon. It travels at the 
speed of light (obviously!). Therefore, it must be massless. 

Example 8.4.2. A particle has 4-momentum p relative to an observer O' whose 4- 
velocity is u' . In the rest frame of this observer u' = (1,0,0, 0), and if p = (E'/c, p') 
in this frame, then 

P ■ u' = E'/c. 

Now consider another observer O with respect to whom the 4-momentum of the 
particle is p = (E/c, p) and the 4-velocity of O' is u' = (y,yv/c). In the frame of O, 

p • u' = yE/c — 7 p- v/c. 


The invariance of the inner product now gives 

E' = 7 (E — p ■ v). 


(8.31) 


In the special case in which the particle is at rest with respect to O, p = 0 and 
E = me 2 . This leads to 


j—]t _ 2 

h = 'ymc 


me 2 

V 1 - (v/c) 2 ’ 


which is the expected expression for the relativistic energy of a particle moving with 
velocity v relative to O'. ■ 


8.4.1 Relativistic Collisions 

Conservation of energy and momentum in relativistic collisions is stated suc¬ 
cinctly in terms of the total four-momenta before and after: p])® t f = p^j, 
where in each case, p tot is the sum of the 4-momenta of all particles involved. 

As a first example, consider two particles that collide and form a single 
third particle. Let the masses of the first two particles be mi and m 2 . We can 
immediately find the mass M of the third particle. Before doing so, we set 
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c = 1 to avoid the cluttering of calculations. This is common in high energy 
physics, in which energy, momentum, and mass are all measured in the same 
unit (usually electron volt, eV). If desired, we can easily restore the factors of 
c at the end by a simple dimensional analysis. With this convention, Equation 
(8.29) becomes p p = to 2 . 

The conservation of 4-momentum in the present situation is p : + p 2 = P, 
where P is the four-momentum of the final particle. Since this is a vector 
equation, all components must equal. In particulare, separating the time and 
the space parts, we get 

Poi + P 02 = P 0 , or Ei+E 2 = E, 

Pi+P2 = P, (8.32) 

which are the conservation of energy and momentum. 

Squaring both sides of p : + p 2 = P gives 

(Pi + P 2 ) ’ (Pi +P 2 ) = P • P, 


or 

Pl • Pl + P2 P2 + 2pi P 2 = P ' P, 


or 

m\ + to 2 + 2p 1 • p 2 = M 2 . (8.33) 

Because of the invariance of the dot product, this equation holds in any in¬ 
ertial frame. 

Let us evaluate (8.33) in the rest frame of the second particle, where 
p 2 = (m 2 ,0) by (8.27), and the energy of the first particle is assumed to be 
Ei. Then 

Pr • P 2 = (Ei,Pi) • (to 2 ,0) = Fito 2 , 
and Equation (8.33) immediately gives the mass of the final particle: 

M 2 = m\ + m 2 + 2m 2 Ei, or M 2 = m 2 + m 2 + 2m 2 Ei /c 2 , (8.34) 

where the second equation restores the necessary powers of c. Note how the 
initial energy E\ on the right-hand side has turned into (part of) the final mass 
M on the left-hand side. This is how large accelerators create new particles 
out of the energy of collision. 

We can also find the momentum of the final particle from the second 
equation in (8.32). This easily gives P = pi, indicating that, in the rest frame 
of particle 2, the final particle moves in the initial direction of particle 1. The 
magnitude of P can be calculated in terms of energies and masses: 

\P\ = \pi | = y/E 2 -m 2 . (8.35) 

The first equation in (8.32) gives the energy of the final particle 


E = Ei + m 2 . 


(8.36) 
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Compton 

scattering 


Combining Equations (8.35) and (8.36), we can obtain the speed of the final 
particle: 


1^1 VEf-ml 
E Ei + m 2 


(8.37) 


A more common collision has two particles initially and two finally. So 
the conservation of 4-momentum becomes Pi + p 2 = P 3 + P 4 - Separating the 
time and the space parts yields the conservation of energy and momentum: 


Ei + E 2 — £3 + £ 4 , 

Pl+P2=P3+P4- (8.38) 

Squaring both sides of Pi + P 2 = P 3 + P 4 gives 

m\ + ml + 2 p 1 • p 2 = m3 + 777.4 + 2 P 3 ' P4’ ( 8 . 39 ) 

which holds in any inertial frame. Evaluating this equation in the rest frame 
of the second particle, yields 

777-1 + m 2 + 2777 2 £i = 777-3 + m 4 + ^{E 3 E A - p 3 • p 4 ). (8.40) 

In this frame, Equation (8.38) becomes £1 + m 2 = £3 + £4 and pi = p 3 + p 4 . 
Solving for £4 and p 4 from these equations and substituting the results in 
(8.40) yields (after some algebra and using £f — |p 3 | 2 = m§) 

777-1 + m 2 + 2t?7 2 £i = - TOg + 2 £ 3 (£i + 777 2 ) - 2pi ■ p 3 , 


or 


+ ml + 2?77 2 £1 = 7774 — 7773 + 2£ 3 (£i + m 2 ) - 2|pi||p 3 | cos $13, (8.41) 

where 6*13 is the scattering angle of the third particle. Once the energy £1 
of the initial incident particle is known, Equation (8.41) gives the scattering 
angle as a function of the energy of the third particle (|pi| and |p 3 | are related 
to £1 and £ 3 , respectively). 

Example 8.4.3. The particle nature of light, which had been proposed by Einstein 
in his explanation of the photoelectric effect, was demonstrated by Compton in what 
is now called the Compton scattering. In this scattering, a photon of energy E 
is scattered off a stationary electron of mass m e . The scattered photon is detected 
at an angle 9 from the direction of the incident photon. What is the change in the 
wavelength of the photon as a function of 91 

In (8.41), let 1 denote the incident photon, 2 the stationary electron, 3 the 
scattered photon, and 4 the scattered electron. Let E' denote the energy of the 
scatterd photon, then, with 777-1 = 7773 = 0 , Equation (8.41) becomes 

ml + 2 m e E = ml + 2 E'(E + m e ) — 2 EE 1 cos 9, 


or 

777 e £ = E'(E + m e ) — EE' cos 9 => m e {E — E 1 ) = EE'(1 — cos#). 
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Restoring the factors of c and noting that E = he/ A, we obtain 
2 ( he hc\ ( hc\ ( hc\ 

(--p)-(-)(-) 

which can be simplified to 

AA = A' — A = —(1 - cos (9) = A c (l - cos 6), (8.42) 

TfleC 

where A c = h/m e c is called the Compton wavelength of the electron. By mea¬ 
suring the difference between the wavelengths of scattered and incident photons, 
Compton could verify Equation (8.42) and demonstrate that light had particle 
property. ■ 


8.4.2 Second Law of Motion 

The Newtonian mechanics defines force as the rate of change of momentum. 
We generalize this to relativity and define 

, dp du 

f = — = to— = TOa, (8.43) 

ar ar 

where r is the proper time of the moving object with mass to, four-velocity u, 
and four-momentum p. Let us explore the meaning of the components of f. 
In a particular inertial frame, we assume that Newton’s second law holds: 


dp _ p 

dt 


(8.44) 


where p is the space part of the 4-momentum. The space part of f can now 
be written as 


? dp dp dt - 

dr dt dr ^ 

The time part of f is a little trickier. First note that 


(8.45) 


dpo 1 dE 
dr c dr 


Next differentiate (8.29) with respect to r to obtain E(dE/dr) = c 2 p-{dp/dr). 
Finally use p/E = v/c 2 to arrive at 


fo 


1 dE 1 c 2 p dp 
c dr c E dr 


—'yv • F 
c 


7/3- F, 


where (3 


v/c. Thus, 


f=(f 0 ,f) = (jP-F, 1 F). (8.46) 


The fact that fo = 7/3 • F could also be obtained by using f • u = 0, which 
is a result of Equation (8.43) and the orthogonality of the 4-velocity and 
4-acceleration (see Problem 8.13). 
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Example 8.4.4. Let a constant force act on a particle of mass m in some inertial 
frame. What is the speed of the particle at time t if it starts from rest? 

Equation (8.44) can be trivially integrated to give p = Ft. Since the force is 
constant, the motion takes place in one dimension. So, we can ignore the vector 
sign and (remembering that /3 = v/c) write 


m'yv = Ft, or my/3 = 


Ft 


Squaring both sides and solving for 0 gives 
Ft/me 


0 = 


\/1 + ( Ft./mc ) 2 ’ 


0 Ft 

or . = -. 

yj l- 0 2 rnc 

Ft/m 

\/\ + (Ft /me) 2 


(8.47) 


Note that for large t (i.e., when Ft >> me), 0 ~ 1 or v ~ c. However, the particle 
can never attain the speed of light no matter how long we wait. On the other hand, 
if Ft « me, then v = ( F/m)t , which is the Newtonian speed of a particle moving 
with constant acceleration. 

It is interesting to consider a particle having a constant acceleration of 10 m/s 2 
(approximately Earth’s gravitational acceleration). How long does it take to attain a 
speed of 0.999c? Over 21 years! (See Problem 8.14). On the other hand, Newtonian 
mechanics requires under one year to achieve the same speed! ■ 


8.5 Problems 

8.1. Show that Equation ( 8 . 6 ) follows from Equation (8.5). Hint: Consider 
the three vectors a, b, and c = a + c. 

8.2. Multiply the matrices in Equation (8.10) to obtain the three equations 
of (8.11). Solve these equations to find all matrix elements in terms of an- 

8.3. In Example 8.3.1, Emmy receives the two signals from the explosions at 
the same time. 

(a) Show that this time is L/(2c) according to Emmy, and yL/(2c) according 
to Karl. 

(b) Let T[ and T' 2 denote the times that Karl receives the signal from the 
front and back firecrackers, respectively. Show that 

T’ — — A + & t' — — A ~ P 

1 2 c y 1 - /3 ’ 2 ^ 2 c y 1 + /?' 

(c) How is AT 1 =T[—T 2 related to At' calculated in Example 8.3.1? Discuss 
your answer. 

8.4. Show that the relativistic law of addition of velocities (8.17) prohibits 
the sum of two large velocities to be larger than the speed of light. Hint: 
Multiply both sides of 0 P < 1 by 1 — 0. 

8.5. Show that the 4-acceleration is 77 -orthogonal to the 4-velocity. 
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8 . 6 . Provide the details of the proof of the statement: a particle is massless 
if and only if it moves at light speed. 


8.7. Apply (8.31) to a photon moving in the x-direction and use \p\ 
to show that 


E' = 



E. 


E/c 


Now use E = he/ A to find a formula for the relativistic Doppler shift. 


8 . 8 . Two identical particles of mass m approach each other along a straight 
line with speed v = (3c as measured in the lab frame. Show that the energy 
of one particle as measured in the rest frame of the other is 


1 + (3 2 
l-(3 2 


me 2 . 


8.9. A particle of mass m and relativistic energy 4mc 2 collides with another 
stationary particle of mass 2m and sticks to it. What is the mass of the 
resulting composite particle. 

8 .10. An electron of kinetic energy 1 GeV (10 9 eV) strikes a positron (anti¬ 
electron) at rest and the two particles annihilate each other and produce two 
photons, one moving in the forward direction (the direction that electron had 
before collision) and the other in the backward direction. What are the ener¬ 
gies of the two photons. The mass (times c 2 ) of electron and positron are the 
same and equal to 0.511 MeV (10 6 eV). 

8.11. A particle of mass m and energy E collides with an identical particle 
at rest. The collision results in the formation of a single particle. Show that 
the mass and the speed of the formed particle are, respectively, y/2m(E + in) 
and \J ( E — m)/(E + m ), assuming that c = 1. 

8 . 12 . A photon of energy E is absorbed by a stationary nucleus of mass m. 
The collision results in an excitation of the nucleus. Show that the mass and 
the speed of the excited nucleus are, respectively, y 2 m(2E + m) and E/(E + 
m), assuming that c = 1 . 

8.13. Use Equations (8.21), (8.43), (8.45), and the orthogonality of the 4- 
velocity and 4-acceleration to show that /o = 7/3 • F. 

8.14. How long does it take a particle to attain a speed of 0.999c, if its 
acceleration is 10 m/s 2 ? What is the answer based on Newtonian mechanics? 
How do the answers change if the ultimate speed of the particle is 0.99999c? 
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Chapter 9 

Infinite Series 


Physics is an exact science of approximation. Although this statement sounds 
like an oximoron, it does summarize the nature of physics. All the laws we deal 
with in physics are mathematical laws, and as such, they are exact. However, 
once we try to apply them to Nature, they become only approximations. 
Therefore, methods of approximation play a central role in physics. One such 
method is infinite series which we study in this chapter. 


9.1 Infinite Sequences 


An infinite sequence is an association between the set of natural numbers 
(often zero is also included) and the real numbers, so that for every natural 
number k there is a real number s^. Instead of the association, one calls the 
collection of real numbers the infinite sequence. Two common notations for 
a sequence are an indicated list, and enclosure in a pair of braces, as given 
below: 

{Sl, S2, • ■ • , Sfc, ■ . •} = {Sfc}fc = i • 

Instead of k, one can use any other symbol usually used for natural numbers 
such as i, j, n, m, etc. We call s n the nth term of the sequence. 

In practice, elements of a sequence are given by a rule or formula. The 
following are examples of sequences: 


111 


1 — — 

’ 2 3 ’ 3 3 ’ 



(9.1) 


An important sequence is the sequence of partial sums in which each 
term is a sum. Examples of such sequences are the following: 


infinite sequence 


sequence of partial 


sums 
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convention: 
0 ! = 1 . 


convergence and 
limit of a sequence 




1 


1 1 


1 


1 + ^3’ 1 + ^3 +^3- •••/» |l,l + l,l + l+2j, 
The nth term of the sequences above are, respectively, 


Sn — 1 


1 1 
2 + 4 


s n — 1 + X + 77 


1 1 
2 + 3 


Sn — 1 + 7T? + 77? 


1 1 
23 + 33 


1 1 1 
s„ - 1 + 1 + - 


1 1 

— = V — 

2 n / -j 2 k ’ 
k—0 

- = ±i 

n 1 

i =1 

vl 

n 3 j 3 ’ 

i=i J 

I-vl 

n! /c! ’ 

k—0 


so that the sequences can be written, respectively, as 


£^ - lEyr 


k—0 


n —0 k i—1 / n= 1 


—/ no 

= 1 J 


,fc=0 V n=0 


(9.2) 


In the last sequence, we have used the usual definition, 0! = 1. A sequence is 
said to converge to the number s or to have limit s if for every positive (usu¬ 
ally very small) real number e there exists a (usually large) natural number 
N such that |s„ — s| < e whenever n > N. We then write 


lim s n = lim s„ = lim sjh = lim s<y = s. (9-3) 

n— >oo v »oo Jfr—>oo >oo 

Note the freedom of choice in using the symbol of the limit. A sequence that 
does not converge is said to diverge. The first three sequences in Equation 
(9.1) are convergent and their limits are 



The last sequence diverges because there is no single number to which the 
terms get closer and closer. 

There are many ways that a sequence can converge to its limit. For in¬ 
stance, the terms s n may steadily increase toward s after some large integer 
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(a) 

(b) 

(c) 

(d) 

(e) 

(f) 


A B 



-t-t-HWH-(-1— 

A s i s 3 s 5 s s 4 s 2 j 

1 1 1 1 1 1 II 1 1 

? 


A 

1 1 1 1 1 II 1 

L S 1 s 2 s 3 S 4 S 5 S 

1_III 1 1_1_1_L 

1 

J 

3 

_► 

1 

111|| 

s s 5 S 4 S 3 S 2 Sj 

_1_1_1_ 


1 

Si 

1 1 1 1 

s 2 s 3 A s 4 s 5 ] 

_1 1 INI_1_1_III 1_ 

B 

i ini i i m i 

s 5 Sj S 3 S 2 S 4 ' 

- 1 - 1 -H- 1 - 1 — 1 — 



S 2 Sj s 5 s 3 s 8 s 7 s 4 S 6 


Figure 9.1: Types of sequences and modes of their convergence: (a) convergent, (b) 
convergent monotone increasing, (c) convergent monotone decreasing, (d) divergent 
monotone increasing, (e) divergent bounded, (f) divergent unbounded. 


N, so that for all n > N, s n < s n+ i < s n+ 2 < s n+ 3 < • • •. 1 In this case 
we say that the sequence is monotone increasing. If the terms s n steadily 
decrease toward s after some large integer N, the sequence is called mono¬ 
tone decreasing. A sequence may bounce back and forth on either side of 
its limit, getting closer and closer to it. A sequence is called bounded if there 
exist two numbers A and B such that 


monotone 

increasing, 

monotone 

decreasing, and 

bounded 

sequences 


A < s n < B for all n. 


A sequence may be bounded but divergent. Various forms of convergence and 
divergence are depicted in Figure 9.1. 

A sequence may have an upper and/or a lower limit. The upper limit is 
a number s such that there are infinitely many n’s with the property that s n 
is very close to s if n is large enough, and there is no other number larger 
than s with the same property. Similarly, the lower limit is a number s such 
that there are infinitely many n’s with the property that s n is very close to 
s if n is large enough, and there is no other number smaller than s with the 
same property. The last sequence of Equation (9.1) has an upper limit of 1 
and a lower limit of —1. It is intuitively obvious that a sequence converges if 
and only if its upper and lower limits are finite and equal. For instance, the 
sequence {(—1 ) Tl /n}()T 1 converges to the single limit 0 after bouncing left and 
right of it infinitely many times. 

One can decide whether a sequence converges or not without knowing its 
limit: 


1 We often use the loose phrase: “For large enough n, ...The precise statement would 
be: There exists an N such that for all n > N , .... 


Cauchy criterion 
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Box 9.1.1. ( Cauchy Criterion). The sequence converges if 

the difference s n — s m approaches zero as both m and n approach infinity. 


We can add, subtract, multiply, and divide two convergent sequences term 
by term and obtain a new sequence. The limit of the new sequence is obtained 
by the corresponding operation of the limits. Thus, if 

lim x n = x, lim y n = y , 

n—*oo n—* oo 


then 


lim (x n ± y„) = x ± y, lim ( x n ■ y n ) = x ■ y, lim — = -, 

n—► oo n—>oo n —>-oo y n y 

provided, of course, that y 0 when it is in the denominator. 


9.2 Summations 


dummy 

summation index 
can be any symbol 
you want it to be! 

N N N N N 

Y.Oix', ^2a k x k , ^2a a x a , ^a+x*, ^aux*. (9.4) 

i=1 fc=1 a=l Jfr=l N=1 

It is not a good idea, however, to use a or t as the dummy index for the 
summation above! 

When adding or subtracting sums of equal length, it is better to use the 
same symbol for the dummy index of the sum: 

N N N N N 

X! ai + X! = + = X! ^ + b v) = ^2( a k+bk)■ 

i=1 <?=1 i==l (?=1 fe=1 

However, 


We have been using summation signs on a number of occasions, and we shall 
be making heavy use of them in this chapter as well. It is appropriate at 
this point to study some of the properties associated with such sums. Every 
summation has a dummy index which has a lower limit, usually written 
under the summation symbol ]T], and an upper limit, usually written 
above it. The limits are always fixed, but the dummy index can be any 
symbol one wishes to use except the symbols used in the expression being 
summed. Therefore, all the following sums are identical: 


Box 9.2.1. When multiplying two sums (not necessarily of equal length), 
it is essential to choose two different dummy indices for the two sums. 
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Thus, to multiply a * by &*> one writes 

JV M 

J2 ai J2 b i 

*=i i=i 

Failure to obey this simple rule can lead to catastrophe. For example, one 
may end up with Yld =1 a i TlaLi = a ibi, which is a sum of terms 

of the form ai 6 i + 0262 + • ■ •, excluding terms such as 0162 or < 2365 , etc. 

The freedom of choice for the symbol of dummy index can be used to ma¬ 
nipulate sums and get results very quickly. As an example, suppose that {a,,} 
is a set of (doubly indexed) numbers which are symmetric under interchange 
of their indices, i.e., ay = a^y Similarly, suppose that 6 y are antisymmet¬ 
ric under interchange of their indices, i.e., 6 y = — bji- Furthermore, assume 
that i and j have the lower limit of 1 and the upper limit of n. What is 
YHi=i Xy=i a ijbij ? Call this sum S. Since the choice of the dummy symbol 
is irrelevant, we have 

n n n n n n 

S = 'y ' 'y ( dijbij = y ( y ( — y \ y \ ap a bfl a , ( 9 - 5 ) 

i=1 j= 1 a=l /3=1 a=l /3=1 

where we used the symmetry of ay and the antisymmetry of 6 y. Since the 
order of summation is irrelevant, we can write S' as S' = — ELi a ( 3 abpa- 
Once again, change the dummy symbols: Choose i for (3 and j for a. Then 
Equation (9.5) becomes 

n n 

S=-EE^ = -S => 2S = 0 =► S = 0. 

i=i j=i 

As another illustration, suppose we want to multiply YliL 0 ai ^ an< ^ 
and express the coefficient of a typical power of t in the product in terms of 
at and bi. Call the product P. Then 

M N M N 

p = a ^ H b o fj = Zm a i h jt i+j - 

i— 0 j =0 i=0 j—0 

We need to use a single symbol for the power of t in the double sum. So, let 
a = i + j. Our goal is to write P = ]Cca,f Q , find c a in terms of ai and 6 y 
and determine the lower and upper limits of the summation on a. The latter 
is easy: a has a lower limit of 0 (when both i and j are zero), and an upper 
limit of M + N. 

For the second dummy index we choose one of the original indices, say i. 
The limits of i cannot be the original limits, because i is now mixed up with 
a and j through j = a — i. Because of the original bounds of i and j, we have 
0 < i < M as well as 


JV M 

=mi aib p 

i=i j=i 


0 < a — i < N or — a < —i < N ~ a or a > i > a — N. 
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Since i is greater than both 0 and a — N, it must be greater than the maximum 
of the two: i > max(0,a — N). This means that the lower limit of the i- 
summation is max(0,a — N). Similarly, since i is smaller than both M and 
a , it must be smaller than the minimum of the two: i < minmaking 
the upper limit of the i-summation min(M, a). We therefore have 


M+AT min (M,a) 

p= iz y <nb a -it a 

a—0 i=m&x(0,a—N) 


M+N / min(M,ct) 

Y I Y a ib a -i ] t a . (9.6) 

c*=0 \ i=max(0,a— N) 


Example 9.2.1. As further practice in working with the summation symbol, we 
show that the torque on a collection of particles is caused by external forces only. 
The torques due to the internal forces add up to zero. We have already illustrated 
this for three particles in Example 1.3.5. Here, we generalize the result to any 
number of particles. 

We use the second formula in Equation (1.31) and separate the forces 


T = rfc x F fc = Y r fc x Fj, ext) + Y F fci 

k=i k =i \ ) 

»p(ext) rp(int) 


= Y Vk x F i ext) + J2 J2 rfc x Fki ■ 

k= 1 k= 1 i^k 

We need to show that the double sum is zero. To do so, we break the inner sum 
into two parts, i > k and i < k. This yields 


T (mt) _ ^ r k x Fki = Y Vk X + Y, r fc x Ffc 

i,k =1 i,k =1 

i^k i>k 

N N 

= Y r fc x Ffci - Y Vk X F,, 


i,k= 1 
i<k 


i,k=1 
i>k 


i,k=1 
i<k 


because, by the third law of motion, F;*, = — Fki- Now, in the second sum, change 
the dummy indices twice: 


T (int) = J2 r fc x F fei - Y r “ x F / 3 a 

i,k—l a,( 3=1 

i>k (x>p 

N N N 

= Y Vk X Ffci - Y Vi X Ffci = Y _ ri ) x Ffci- 


i,k=1 
i>k 


i,k=1 
i>k 


i,k=1 
i>k 


As in Example 1.3.5, we assume that F^ and r*. — Vi lie along the same line in which 
case the cross products in the sum are all zero. m 
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In the sequel, we shall have many occasions to use summations and ma¬ 
nipulate them in ways similar to above. The reader is urged to go through 
such manipulations with great care and diligence. The skill of summation 
techniques is acquired only through such diligent pursuit. 

9.2.1 Mathematical Induction 

Many a time it is desirable to make a mathematical statement that is true 
for all natural numbers. For example, we may want to establish a formula 
involving an integer parameter that will hold for all positive integers. One 
encounters this situation when, after experimenting with the first few positive 
integers, one recognizes a pattern and discovers a formula, and wants to make 
sure that the formula holds for all natural numbers. For this purpose, one 
uses mathematical induction. The essence of mathematical induction is 
stated in 


Box 9.2.2. ( Mathematical Induction). Suppose that there is asso¬ 
ciated with a natural number (positive integer) n a statement S n . Then 
S n is true for every positive integer provided the following two conditions 
hold: 

1. Si is true. 

2. If S m is true for some given positive integer m, then S m +1 is also 
true. 


We illustrate the use of mathematical induction by proving the binomial 
theorem: 


(«+r = EC.r ra ' ¥ = E 


m 

1 X k 
k =0 

= a 171 + ma rn ~ 1 b 


ml 


-a m - k b k 


k= 0 

-i u , w(m - 1) „ m _ 2l ,2 

2 ! 


k\(m — k)\ 

a m -' z b 2 + ■ ■ ■ + mab m ~ 1 + b m , (9.7) 


where we have used the shorthand notation 

m\ 


m 


k J k\{m — k)\' 


(9.8) 


The mathematical statement S m is Equation (9.7). We note that Si is trivially 
true: (a + b) 1 = a + b. Now we assume that S m is true and show that iS m +i 
is also true. This means starting with Equation (9.7) and showing that 


(a + b) m+1 = 



a m+l-k b k 


induction principle 


binomial theorem 
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inductive 

definitions 


Then the induction principle ensures that the statement (equation) holds for 
all positive integers. 

Multiply both sides of Equation (9.7) by a + b to obtain 


(a + b) m+1 = 



a m- kb k +1 


Now separate the k = 0 term from the first sum and the k = m term from 
the second sum: 


(a + b) 


m +1 _ ^m+1 


= a 


ra +1 


£ 

k—l 


£ 

k—l 




^m—k+l-Lk 


£ 

k=0 


m 


k^k+1 _|_^m+l 


let k = j — 1 in this sum 


£ 

j=i 


m 

j ~ 1 


% m-j +1 V +b m+ 1 . 


The second sum in the last line involves j. Since this is a dummy index, we 
can substitute any symbol we please. The choice k is especially useful because 
then we can unite the two summations. This gives 


(a + b) 


m+1 _ fl m+l 


m 

£ 

k= 1 


m 

k — l 


fc+1 t k 


b m+l_ 


If we now use 



which the reader can easily verify, we finally obtain 


(a + b) m+1 = a m+1 


+ jr l \a m - k+1 b k + b m+1 

k—l ' ' 


■E(-rw 

fc =0 ' ' 


Mathematical induction is also used in defining quantities involving inte¬ 
gers. Such definitions are called inductive definitions. For example, induc¬ 
tive definition is used in defining powers: a 1 = a and a m = a m ~ 1 a. 


9.3 Infinite Series 

An infinite series is an indicated sum of the members of a sequence 
This sum is written as 

OO OO OO OO 

ai+a 2 + a 3 -\ -= ^ a k = ^ aj = ^ a n = ^ a*, 

fc=l j= 1 71=1 *=i 
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where we have exploited the freedom of choice in using the dummy index as 
emphasized in the previous section. 


Box 9.3.1. Associated with an infinite series is the sequence of partial 
sums {5 rl }^° =1 with S n = a\ + a 2 + • • • + a n = X!fc=i • A series is con¬ 
vergent (divergent) if its associated sequence of partial sums converges 
(diverges). 


For a convergent series the nth member of the sequence of partial sums will 
be a good approximation to the series if n is large enough. This is a simple 
but important property of the series that is very useful in practice. It should 
be clear that the convergence property of a series is not affected by changing 
a finite number of terms in the series. Convergent series can be added or 
multiplied by a constant to obtain new convergent series. In other words, if 
a n = A and X^°=i K = B, then 

OO OO 

^(a n ± b n ) = A ± B, r ^ a n = rA, 

71=1 71=1 

for any real number r. 

9.3.1 Tests for Convergence 

When adding, subtracting, or multiplying finite sums, no problem occurs 
because these operations are all well defined for a finite number of terms. 
However, when adding an infinite number of terms, no operation on the infinite 
sum will be defined unless the series converges. It is therefore important to 
have criteria to test whether a series converges or not. We list various tests 
which are helpful in determining whether an infinite series is convergent or 
not. 


The nth Term Test 


If linin^oo a n 7 ^ 0, then a n diverges. This is easily shown by looking at 

the difference S n — S n - 1 and noting that it is simply a n , and that if the series 
converges, then this difference must approach zero by the Cauchy criterion. 
Thus none of the following series converges: 


OO 



Ei- 1 )' 

k=1 


k - 1 
5k - 1 ’ 


Ei- 1 ) 1 . 

i=i 


E 


m 2 — 10 
8 m 2 +1 


if the infinite 
series is to 
converge, its nth 
term must 
approach zero. 
But that by itself 
is not enough for 
convergence! 


On the other hand, the series 


E 


n 

n 2 + 1 ’ 


El- 1 )'!- 


k =1 


OO 1 

j= i J 


E 


i 

m 2 ’ 
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may or may not converge: The approach of a n to zero does not guarantee 
the convergence of the series. In fact, the first and third of the series above 
diverge while the second and last converge. 


Box 9.3.2. Do not confuse the convergence of an infinite series with the 
convergence of its nth term. If the nth term converges to anything but 
zero, the series will not converge! 


Absolute Convergence 

absolute 
convergence 


Comparison Test 

If | an | < b n for large enough values of n and b n converges, then a n 

is absolutely convergent and a n < bn- On the other hand, if 

a n > b n > 0 for large values of n and bn diverges, then so does ^^°=i 

Integral Test 

This is probably the most powerful test of convergence for infinite series. 
Assume that linin^oo a n = 0, so that the series is at least a candidate for 
convergence. Now find a function / which expresses a n , i.e., such that f(n) = 
a n , and assume that f(n) decreases monotonically for large values of n. Then 

Theorem 9.3.1. The series a n converges if and only if the integral 

f f(t) dt, exists and is finite for some real number c > 1. 

To see this, refer to Figure 9.2 and suppose that c lies between two con¬ 
secutive positive integers m and m + 1. Since the convergence or divergence 
of a series is not affected by the removal of a finite number of terms of the 
series, we are allowed to consider either the series Y^k=m ak or Y^k= m +i ak - 
Figure 9.2(a) compares the area under the curve /(f) with the shaded area 
which is the sum of the areas of an infinite number of rectangles each of height 
f(k) = ak for some positive integer k larger than (or equal to) m + 1. The 
width of all rectangles is unity. The shaded area A is therefore 

OO OO OO 

A = ^2 f(k)At= ^2 a k ■ 1 = ^2 ak ' 

k=m-\-l k=m -\-1 k—m-\-l 

It is clear from Figure 9.2(a) that 

OO 00 /»oo 

/(f) dt => ^2 ak < / /(f) dt. 

k=m-\-l c 



If l a "l converges, so does a n- The series is then said to be abso¬ 
lutely convergent. For example, the series 1) fc /2 fe converges because 

Sfcli converges. However, although the series 1 1/fc can be shown 
to diverge, YlkLii"I) k /k is known to converge. 
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Figure 9.2: The area under the curve (a) bounds, and (b) is bounded by, the infinite 
sum obtained from the series by removing a finite number of terms. This finite number 
of terms is the first m terms for (a) and the first m — 1 terms for (b). 


Similarly, Figure 9.2(b) shows that Y^k=m ak l ar g er than the area under the 
curve. We thus can write 

OO /»00 00 

^2 a k< fit) dt < a k . 

k—m -\-1 c k=m 

Hence, if the integral is finite Y^kLm+i ° fc (being smaller than the integral) 
is also finite and the series converges. If, on the other hand, the integral is 
infinite then Y^k=m ak (being larger than the integral) diverges. 

The integral test leads directly to the observation that the Riemann zeta 
function, also called the harmonic series of order p defined by 

OO 

^) = SfcF = 1 + ^ + ^ + --- (9 - 9) 

fc=l 

converges for p > 1 and diverges for p < 1. In particular, 

OO 

^ 1 ) = Efc = 1 +2 + 3 + '"’ 

fc=l 

called simply the harmonic series, diverges. 

Ratio Test 

Consider the series Y^= i®n- If a n ^ 0 for large enough n and 

a n +1 _ ^ 

then the series is absolutely convergent if R < 1 and is divergent if R > 1. 


lim 

n — kx ) 


Riemann zeta 
function or 
harmonic series of 
order p 


harmonic series 
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The terms that we choose for the ratio test need not be consecutive. To 
see this, note that 

t &n+2 i. ^n+2 v &n+l ( v &n+1 

lim - = lim - • lim - = lim - 

n—*oo a n n—*oo a n +i n^oo a n \n—> oo a n 

In going to the last equality, we have used the following: 

t &n +2 v &m+l v &ra+l &n+l 

lim - = lim - = lim - = lim - , 

n—*oo a n _|_i (m—l)^oo a m m—>oc a m n—>oc a n 

where we have substituted m = n + 1 and used Equation (9.3) and the fact 
that mn —•> oo if and only if (m — 1) —> oo. It now follows that 

,. O-n+l 

lim - 

n—>oo a n 

and the LHS will be less than or greater than one if the term inside the square 
root sign is. In fact, one can generalize the above argument and state that 
the series is convergent (divergent) if 

lim a ” +J = ( lim an+1 ^ (9.10) 

n—>oo a n yn—too a n ) 

is less than (greater than) one for any finite j. 

The Riemann zeta function can sharpen the ratio test of convergence to 
allow for certain cases in which the ratio is one. Instead of taking the complete 
limit, we approximate the ratio of consecutive terms for the Riemann zeta 
function to first order in 1/n. This yields 





where we used the binomial expansion formula, to which we shall come back 
[see Equation (10.15)]. We know that such a ratio leads to a convergent series 
if p > 1 and to a divergent series if p < 1. Therefore, we obtain 


Theorem 9.3.2. ( Generalized Ratio Test). If the ratio of consecutive 

terms of a series satisfies ” +1 —* 1 — —, then the series converges if p > 1 

a n n 


and diverges if p < 1. 


Alternating Series Test 

An alternating series 

OO 

ai — a 2 + a 3 — 04 H-= ^(— iy +1 aj, aj > 0, 

j'-l 

converges if linij_ >00 aj =0, and if there exists a positive integer N such that 
ak > Ofc+i for all k > N. 
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Example 9.3.3. A useful series is the geometric series: 

OO 

b + bu + bu 2 + bu 3 + • • • = 22 bu k . 

k= o 

We claim that this series converges to 6/(1 — u ) if |u| < 1, and diverges if |u| > 1. 
To show this, let S„ represent the sum of the first n terms, so that (SVij-^o is the 
sequence of partial sums. We calculate S n as follows. First note that 


n n 

S„ = 22 bu k =$■ uS„ = 22 bu k+1 . 
k =o k =o 

Next separate the zeroth term from the rest of S n and rewrite it as 

n n—1 n—1 

S n = 6 + 22 bu k = b + 22 bu m+1 = b + '22 bu k+1 , 

k=1 m=0 fc=0 


where in the second equality, we changed k to m = k — 1 and in the last equality 
we changed the dummy index back to k. Subtracting uS„ from S n , we obtain 


S n - uSn = (1 - u)S n = b+Y, b u k+1 ~ J2 buk+1 


k=0 
n— 1 


= b + buk+1 - byk+1 + byn+1 


S n = 


k=0 

b - bu n+1 
1 — u 


= b — bu 


n+1 


It is now clear that u " +1 —> 0 for n —> oo only if |w| < 1. For |rt| > 1, the series 
clearly diverges. For |rt| = 1 the partial sum is either S n = nb (when u = 1), which 
diverges for any nonzero 6 , or S n = b'}2'^‘_ 0 (—l) n , which bounces back and forth 
between +6 and — 6 , and never converges. So the series diverges for |u| > 1 
For example, if b = 0.3 and u = 0.1, then the series gives 


0.3 + 0.3 x 0.1 + 0.3 x 0.01 + ■ • • = 0.33333 


0.3 


1 - 0.1 


For 6=1 the series gives 

1 + u + u 2 -\ -= ——— = (1 — u) -1 , (9.11) 

1 — u 

which can be thought of as the binomial expansion when the power is —1. As we 
shall see in Section 10.1, there is a generalization of binomial expansion for any real 
power. ■ 


The result of Example 9.3.3 is important enough to be summarized: 


Box 9.3.3. The series b+bu + bu 2 + bu 3 + • • • = bv,k called the 

geometric series. It converges to 6/(1 — u) if |u| < 1, and diverges if 

M > i. 


geometric series 
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conditional 

convergence 


Example 9.3.4. Another example of a series used often is 


11 

+ + 2l + 3! + ' 


oo 


E 


1 

fc!' 


The ratio test shows only that the series converges, but the comparison test gives 
us more information. In fact, since 1/n! < 1/2” -1 for n > 1, we conclude that 




< 1 I _L _L 

- + 2 + 2 2 + 2 3 


H-. 


But the RHS is the geometric series with u = 1/2 which is known to converge to 2. 
We thus obtain the upper bound to our series: 




k =0 


It is well known that the series converges to e = 2.718281828 • 


Example 9.3.5. If one alternates the sign of the terms in the harmonic series, 
one obtains the series 



which is convergent by the alternating series test. In fact, we shall show in Example 
9.4.4 that the series converges to In 2. Note that the series is not absolutely conver¬ 
gent. A convergent series that does not converge absolutely is called conditionally 
convergent. g 


The invention of calculus motivated several other areas of investigation in math¬ 
ematics. One of these areas was infinite series. For example, it was not always 
possible to find a closed formula for the integral of a function. So, it was common to 
expand the integrand in powers of the variable and integrate the resulting infinite 
series. No question was asked as to the legitimacy of the operations performed. 
In fact, Newton, Leibniz, and Euler regarded infinite series as an extension of the 
algebra of polynomials, and they did not realize that new problems would arise if 
a finite sum were extended to an infinite series. However the apparent difficulties 
that did arise caused them occasionally to bring up the question of convergence and 
divergence. 

Some mathematicians of the seventeenth century had observed the difference 
between convergence and divergence. In 1668 Lord Brouncker , while studying the 
relation between In a; and the area under y = 1/x, demonstrated the convergence 
of the series for In 2 and ln(|) by comparison with a geometric series. Newton and 
James Gregory, who made much use of the numerical values of series to calculate 
logarithmic and other function tables and to evaluate integrals, were aware that the 
sum of a series can be finite or infinite. The terms “convergent” and “divergent” 
were actually used by Gregory in 1668, but he did not develop the ideas. 

Leibniz, too, felt some concern about convergence and noted in a letter of Oc¬ 
tober 25, 1713 to John Bernoulli what is now a theorem that we call the alternating 
series test. Maclaurin used series as a regular method for integration. He recognized 
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that the terms of a convergent series must continually decrease and become less than 
any given quantity no matter how small. 

D’Alembert also distinguished convergent from divergent series. In his article 
“Serie” in the Encyclopedic he describes a convergent series as that which approaches 
a finite value and consequently has terms that keep diminishing. In this same 
volume, d’Alembert gave a test for the absolute convergence of the series YlkLi a k, 
namely, if for all k > N, the ratio \ak+i/a,k\ < r where r is a positive number 
independent of k and less than 1, the series converges absolutely. 

Edward Waring (1734-1798), Lucasian professor of mathematics at Cambridge 
University, held advanced views on convergence. He showed that the harmonic series 
of order p converges if p > 1 and diverges if p < 1. He also gave the well-known test 
for convergence and divergence, now known as the ratio test. 


9.3.2 Operations on Series 

It has already been mentioned that convergent series can be added, subtracted, 
and multiplied by a constant. There are other important operations one can 
perform on convergent series. These operations may be “obvious” for finite 
sums, but they have to be justified for infinite series. In fact, performing such 
obvious operations on divergent series leads to contradictory results. 

One such operation is grouping: 


Box 9.3.4. One can group the terms of a finite sum or a convergent 
infinite series in any way one desires, and the sum will not change. 


The operation of grouping is essentially putting parentheses around a collec¬ 
tion of terms of the series (or the sum), adding the terms inside each parenthe¬ 
ses first, and then adding the results. This is simply the associative property 
of addition. It turns out that this associative property of addition does not 
apply to divergent infinite series. 2 For example, ]Cm=o(~l) m gi yes an infinite 
number of zeros if every +1 is grouped with one —1. On the other hand, the 
same series can be grouped such that the first +1 is set aside and the rest of 
the terms are paired. The result would then be a +1 with an infinite number 
of zeros.If a series is divergent and not bounded, so that the sum is infinite, 
then any grouping of terms gives infinity. 

Another operation is the rearrangement of terms of a series. This is the 
commutative property of addition: 


Box 9.3.5. If a series is absolutely convergent then the rearrangement 
of terms does not change either the nature of convergence or the limit of 
the series. A conditionally convergent series does not share this property. 


2 Caution is to be exercised not to move the terms around, as this will, in general, affect 
the sum as explained in the property of rearrangement described below. 


grouping of 
convergent series 


warning! 

rearranging terms 
is not, in general, 
allowed! 
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To see the importance of absolute convergence, consider the alternating series 
Y? i (—1 ) k+1 /k —which converges conditionally to In 2—and rearrange terms 
as follows: 


ffc+i 

—k — = 1 + 5 + i H -+ 5 + —) 

k =1 

= 1 + I + I + I + I +_I- I- I_ 

2 i 3 i 4 i 5 i 2 4 6 

_ §( 1 + § + f +—) 

= 1 + 5 + 5 + 3 + I h -( 1 + 5 + 5 h -) = °> 


where in the second line, terms with even denominators have been added 
of and subtracted with the positive ones interspersed among terms with odd 
denominators. 

The third operation is multiplication of two series. As for rearrange¬ 
ment, 


Box 9.3.6. Multiplication is defined only for absolutely convergent series: 
If the two series YkLi a k an d X^=i are absolutely convergent, then 

their product (YkLi a k) ' b j) = Sfcli Y^jLi a ^j = YnLi c i is also 

absolutely convergent. 


The last series is a rearrangement of the terms aubj into a single term Cj. 
This rearrangement makes it necessary for the original series to be absolutely 
convergent. 


9.4 Sequences and Series of Functions 

The infinite series of the last section are useful when we want to approximate 
a number, such as e or In 2 by a (large) sum of other (rational, decimal) num¬ 
bers. Physics, however, deals with functions as well as numbers. It is therefore 
useful to know how to approximate functions in terms of “elementary” func¬ 
tions. In this section we shall investigate the possibility of expressing a given 
function in terms of a series of functions. Since functions give numbers once 
their arguments are assigned a value, many of the ideas developed in the 
preceding two sections will be employed. 

Suppose for each natural number n there is a function f n {x). Then, the 
set {fnix)}^! is called a sequence of functions. Just as in the case of 
sequences of numbers, we need to address the question of the convergence 
of the sequence of functions. This reduces to the question of convergence of 
ordinary numbers once we substitute values for x. Variation of f n (x) with x 
opens up the possibility of convergence for some values of x and divergence 
for others. For instance, the sequence { x converges for — 1 < x < 1 and 
diverges for all other values of x. 
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More interesting than sequences of functions are series of functions: series of functions 

OO 

h(x) + f 2 (x) + h{x) -\— = h{x). 

k =1 


The nth partial sum of such a series is 


Sn(x) = fl(x) + f 2 (x) 4-b fn(x) = ^ ,fj(x). 

i=! 

The convergence of a series of functions Y^kLi fk( x ) depends on x. For ex¬ 
ample, the series may converge for x = 0.35. This means that the series of 
numbers /fc(0.35) converges, i.e., there exists a real number s such that 

for every e there exists an N with the property that | Y^k= l /fc(0.35) — s| < e 
whenever n > N. It should be clear that an N that works for one value of 
x —here 0.35—and e, may not work for other values of x and e. Thus, N 
depends on x and e, and this dependence is denoted by N(x , e). 

We can imagine making a table with one column consisting of the values 
of x and a second column consisting of the corresponding limits of the series 
of numbers whose terms are f n evaluated at the value of x. The table then 
defines a real-valued function, say S(x ), which is called the limit of the series 
of functions, and one writes 

OO 

S(x) = lim S n (x) = y ^fk(x). (9-12) 

n —»oo z ' 

k= 1 

We have already seen examples of series of functions: the geometric series 
u n —convergent for u < 1—in which the terms are functions of u 
with f n (u) = u n , and the Riemann zeta function (or harmonic series of degree 
p) —convergent for |p| > 1—in which the terms were functions of p with 
fn(p ) = 1 /n p . 

In general, the sum in Equation (9.12) may converge only for a limited 
range of values of x. To find this range, we impose the ratio test on the terms 
of the series. This yields 


r(x) 


lim 

k—* oo 


fk+i{x) 

fk(x) 


< 1 , 


(9.13) 


which is an inequality in x that can be solved to find the values of x for which 
the series converges. 

Example 9.4.1. As an example of the application of Equation (9.13), let us find 
the values of x for which the series 

[In (a: + l)] k 


k= 1 



k 
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converges. The ratio in (9.13) is 


r(x) = lim 

k —too 


[ln(® + l)] fc+1 /(fc + 1) 


[In (a: + l)f +1 k 

[ln(* + 1 )} k /k 

Jim 

k —too 

[ln(a; + l)] fc k + 1 


= |ki(a; + 1)| lim 

k —too 


|fc + l 

So, the condition for convergence is 
|ln(a: + 1)| < 1 


= |ln(* + 1)| . 


— 1 < In (a; + 1) < 1 


or 

e -1 < * + 1 < e => e -1 — l<x<e—1 

and the series converges for —0.632 < x < 1.718. 

Let us now check the convergence of the series for the two end points. The left 
end point corresponds to ln(a; +1) = —1 for which the series becomes 
which is convergent (see Example 9.3.5). On the other hand, for the right end point, 
ln(a: + 1) = 1, and the series becomes V n which is the divergent harmonic 

series. Thus, the interval of convergence is —0.632 < x < 1.718. _ 


An important notion is uniform convergence: 


Box 9.4.1. If, for a given e, it is possible to find an N such that [^(a;) — 
5(x)| < e whenever n> N for all values of x in some interval ( a,b)—so 
that N is independent of x—then the series is said to converge uniformly 
on (a, b). 


Clearly, for uniform convergence to have any meaning, there must exist a range 
of values of x for which the series converges uniformly because a series may 
converge for all values of x on the real line without converging uniformly for 
any interval of the real line. A pictorial representation of uniform convergence 
is shown in Figure 9.3. Basically, we say that a series is uniformly convergent 
if the graphs of partial sums S n (x), after a certain large N, all lie within a 
(narrow) strip of width e containing the graph of the limit function f(x). 

There is a useful test for the uniform convergence which works for a large 
number of familiar series and goes by the name of the Weierstrass M-test: 
Let J2kL i fk(x ) be a series of functions all defined in an interval 3 (a, b). If 
there is a convergent series of positive numbers 'f2T=i such that \fk{%)\ < 
Mfc for all x in (a, b), then fk(x) converges absolutely for each such x, 

and is uniformly convergent in (a, b). 

Example 9.4.2. Consider the series x n /n p , which is a generalization of the 

geometric series (for which p = 0). We want to see for what values of p and in what 

Instead of an interval, one may use the union of many intervals. In fact, the statement 
is true even when the interval (a, b) is replaced with a general subset of the real line. 
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interval of x is the series convergent. One way to get the answer is to apply the 
ratio test: 


lim 

C'7l+1 

= lim 

n —>oo 

an 

n —»-oo 


x n+1 

{n+l)p 


rr 

x n 


Iasi lim 


oo y n -j- 1 


It follows that, regardless of the value of p, the series converges for \x\ < 1, and 
diverges for |x| > 1. For x = 1, the series becomes l/ nP which converges for 

p > 1 and diverges for p < 1 as pointed out in the integral test of convergence. 
Finally if x = —1, the alternating series test of convergence tells us that the series 
converges for all p > 0. What about the uniformity of convergence? We note that 
for M n = 1 /n p , and for | re | < 1, we have 


< — = M n 
n p 


and the series of M n converges as long as p > 1. Thus, for p > 1, the series 
53“Li x n jn v is uniformly convergent. ■ 


9.4.1 Properties of Uniformly Convergent Series 


The importance of uniformly convergent series lies in the nice properties such 
series possess. For instance, if Ui(x) is continuous in the interval a < x < b, 
and if the series u i{ x ) is uniformly convergent in that interval, then the 
function defined by f(x) = X^St u i( x ) is a l so continuous in the interval. This 
statement is equivalent to saying that for x and Xq in the interval (a, b ), one 
has 


lim lim S n ( x ) 

x —>xq n —kx> 


lim lim S n (x) 

n —kx) x —>xq 


Accordingly, uniform convergence permits the interchange of the two limit 
processes. 
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Another property, which is extremely useful in physical applications, is the 
fact that 

Theorem 9.4.3. If fix) = 'fffdL 1 Ui(x) is uniformly convergent, and each 
Ui{x) is continuous for a < x < b, then the series can be integrated term by 
term, i.e., 

/ b pb / °° \ oo r.b 

f{x) dx= 5>0r) dx = ^2 / u i( x ) dx, 

\i=i ) i= 1 Ja 

i.e., integration and summation can be interchanged. 

Example 9.4.4. Consider the geometric series 1\ which, by Example 

9.4.2, converges uniformly for — 1 < t < 1. Changing t to —t does not change either 
the interval or the nature of convergence of the series. We thus have 

OO OO 

y- n = E(-*)' = E ( - 1)V - (9- 14 ) 

' i=0 i=0 

Because of the uniform convergence of the series, we can integrate both sides from 
0 to a: with — 1 < x < 1 to obtain 


r dt 
Jo 1 + t 


ln (1 + *)*£(-!)* f 

i=o J ° 


t i dt = j2(- ir 
2 = 0 


x i+1 

i + T 


With x = 1, we obtain the result alluded to in Example 9.3.5. 

Note that the integral of a series may be convergent for a bigger range of values 
of its argument than the original series. Here, the original series was divergent (for 
t = 1) while its integral converges (for x = 1). _ 


The property stated in Theorem 9.4.3 is a useful tool for the expansion 
of physical quantities in terms of some more “elementary” quantities. For 
example, one can expand the electric potential—usually given in terms of an 
integral—as a sum of the potentials of a single charge, a dipole, a quadrupole, 
etc. (see Section 10.5). In many physical situations only the first few terms 
of the series expansion will be of importance. Thus, for instance, in atomic 
transitions, it is only the dipole term that participates significantly. 

One can also differentiate a uniformly convergent series. To be specific, 

Theorem 9.4.5. Suppose that u' n ix) = du n /dx is continuous for a < x <b, 
that the series Y^Li u ni x ) converges to fix) for a < x < b, and that the 
series u 'ni x ) converges uniformly for a < x <b. Then 

d OO OO 

/'(*) = Un i x ) = YJ “«(*). a < x < b, 

71 = 1 71=1 

i.e., one can change the order of differentiation and summation. 
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Other operations defined on uniformly convergent series are addition, sub¬ 
traction, and multiplication by a continuous function: If EEi u *( a: ) an d 
J2iZi v i( x ) are uniformly convergent for a < x < b and h(x) is continuous 
in the same interval, then the series 

OO OO 

y, [Ui{x) ±v i {x)\ 1 y h(x)ui(x), 

i=1 i=l 

are also uniformly convergent for a < x < b. 

The mathematicians of the seventeenth and eighteenth centuries used series indis¬ 
criminately. By the beginning of the nineteenth century some absurd results from 
manipulating infinite series stirred up some interest in questioning the validity of 
operations performed on them. Around 1810 a number of mathematicians began 
the exact handling of infinite series. 

In his 1811 paper and his Analytical Theory of Heat, Fourier gave a satisfactory 
definition of convergence, though in general he worked freely with divergent series. 
His definition of convergence was essentially in terms of the sequence of partial sums. 
Moreover, he recognized that the convergence of a series of functions of the variable 
x may be achieved only in an interval of x values. Although Fourier stressed that 
a necessary condition for convergence is that the terms of the series approach zero, 
he was fooled by the series an d thought that its sum was ^ [substitute 

t = 1 on both sides of (9.14)]. 

The first important and strictly rigorous investigation of convergence was made 
by Gauss in his 1812 paper Disquisitiones Generates Circa Seriem Infinitam wherein 
he studied the hypergeometric series (see Section 11.2.1). Though Gauss is often 
mentioned as one of the first to recognize the need for restricting the series to their 
interval of convergence, he avoided any decisive position. He was so much concerned 
to solve concrete problems by numerical calculations that he used a divergent ex¬ 
pansion of the gamma function. When he did investigate the convergence of the 
hypergeometric series, he remarked that he did so to please those who favored the 
rigor of the ancient geometers. 

Cauchy’s work on the convergence of series is the first extensive treatment of 
the subject. In his Cours d'Analyse Cauchy clearly defines the sequence of partial 
sums and gives a rigorous definition of the convergence and divergence of the series 
in terms of this sequence. It is also in this work that he gives what is now called 
the Cauchy criterion for convergence of a sequence (see Box 9.1.1). He proves this 
to be a necessary condition, but merely remarks that if the condition holds, the 
convergence of the series is assured. He lacked the knowledge of the properties of 
real numbers to provide a proof. Cauchy then goes on to state and prove many of 
the results that we have outlined in our discussion of the tests for convergence. 


9.5 Problems 

9.1. Show that 


(a) ELi kzk ~ X = Efc=o( fc + !)— 


(b) x 2 ]Tfc = o a k x k = Y2tl a k - 2 X k . 
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9.2. Use some small values of M and N (say M = 2, N = 3) and verify the 
validity of Equation (9.6). 

9.3. Use Equation (9.8) to show that 



9.4. Use mathematical induction to prove the following relations: 


( a ) + ( x n ) = nx r ‘ 


(b) ELo xk = 


9.5. Use the integral test to show that the harmonic series of order p is 
convergent for p > 1 and divergent for p < 1. 

9.6. Test the following series for convergence or divergence: 


(a\ ( - 1 )" n /'K'l V^ 00 (-l) n sin 

W Lju= 1 n 2 + l • Z^n=1 n+ 1 


jn=l n 2 + l ' 
■>oo n+1 


irll V°° n+1 (p\ V°° n+1 

vZ^n=l 3n 2 +3n' v c / Z^n=l 3n 2 +5n-l 


ir-l V°° l£ii 
vW Z-m=l nP * 

(f) E~ 2 ;nb- 


where a is some real number. For (c), consider the three cases p > 1, p < 1, 
and p = 1. 

9.7. Prove convergence or divergence by the comparison test: 

OO • OO OO v OO 

E sin TL \ -\ 1 \ 77- H - 5 v ■> 1 

n 2 ’ n 3 — 1 ’ n 2 — 3?r — 5 ’ Uft In n 

n=l n—2 n—1 n—2 v 

9.8. Prove convergence or divergence by the integral test: 

OO OO OO OO 

E l v 71 v > 1 v ■> 1 

n 2 + 1 ’ -E 77,2 -|- 1 ’ ^E n ^ n 2 > E> n l n n l n l n n ' 

n—1 n= 1 n=2 n—2 

9.9. Prove convergence or divergence by the ratio test: 

~ 2 n + l E, (_i)« ~ 5 n 

/ -J _l_ 77 5 / —. 77 I / 771 


9.10. Use the ratio test to find the range of values of x for which the following 
series converge. Make sure to investigate the end points. 


(a) E~ i 

(ln x) n 
n+1 

(b) 

^-^oo 

E-m=l 

4 n sin n a; 
(n+l)5 n . 

(c) 

y~~\ OO 

E-m= 

x n 

=1 v^' 

(d) E~ i 

(ln x ) n 
n! 

(e) 

v-^oo 
2-^n= 1 

x n 

3"n! ’ 

(f) 

Y^OO 

E-m= 

n 2 

■3 (x— 2) n * 

(g) En=l 
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Y-'OO 

E-/n—1 

n!x". 

(i) 

Y^oo 

Z-m= 
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1 (In a;) 72 ' 

(j) E~=0 

nx n 
n 2 + 1 ' 

(k) 

^-^oo 

E-/n=l 

U 2 +i) n 

n 3 

(1) 

Y^oo 

2-^n— 

n 2 

1 (x+l) n ' 

(m) S“o (^)". 

( n ) 

V^OO 

E-/n=l 

(*)'■ 

(o) 

r-'OO 

E-m= 

(x-2)’ 1 
-0 n 2 + l ’ 

(p) E~ 0 

(!)"■ 

(q) 

^-^oo 

E-m=l 


(r) 

Y^oo 

E-m— 

.o++T-[6M 
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9.11. Write the first four terms of the following series: 


V_-_ V 

^ 2 • 4 • ••2n’ ^ 


(-i r 


E 


' 2 • 4 • • • 2 n ' ln(n 4- 1) . 

n= 1 n=l v 7 n —1 

Test for convergence or divergence of these series. 


10/—Q ’ 

Vr 


E 





Chapter 10 

Application of Common 
Series 


The preceding chapter concerned itself with the formal properties of infinite 
sequences and series, especially the sequences and series of functions. One 
of the useful properties of the infinite series of functions is that they can be 
approximated by finite sums. In this approximation, two important features 
of the series play crucial roles: the simplicity of the functions used in the series 
and the convergence of the series. This chapter deals with some of the series 
of functions most commonly used in mathematical physics. 


10.1 Power Series 

One of the most common series of functions is the power series where the 
nth term of the series is c n (x — a) n with c n a real number. To be specific, a 
power series in powers of (x — a) is of the form 

OO 

c n (x - a) n = c 0 + ci(x - a) + c 2 (x - a) 2 H-. (10.1) 

n =0 

An important special case is when a = 0, so that we have 

OO 

y: c n x n = c 0 + cia; + C 22; 2 H-. (10-2) 

n —0 


radius of 
convergence of a 
power series 


Sometimes negative powers are also included, but by power series we usually 
mean Equation (10.1). 

We note that Equation (10.1) converges for x = a. The question is whether 
it converges for any other values of x, and if so, what these values are. It turns 
out that: 
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Theorem 10.1.1. Every power series J2^Lo Cn ( x ~ a ) n ^ as a radius of 
convergence r* such that the series converges absolutely and uniformly when 
|* — a| < r* and diverges for \x — a| > r*. If r* ^ 0 and r\ is a number such 
that 0 < r\ < r*, then the series converges absolutely and uniformly for 
\x — a\ <r\. 

The number r* can be 0 (in which case the series converges only for x = a), 
a finite positive number, or oo (in which case the series converges for all x). 

The radius of convergence can be evaluated by using the ratio test. Con¬ 
sider the ratio 


r(x) 


lim 

n—* oo 


c n + l{x - a) n+1 
c n (x - a) n 


a | lim 

n—>oo 


Cn+1 

Cn 


and note that the series converges if r(x) < 1, or 


\x — a\ < lim 

n —mdo 


Cn+1 


The RHS is naturally defined to be the radius of convergence 

Cn 


r* = lim 

n—> oo 


^n+1 


if the limit exists. 


(10.3) 


It can be shown that the radius of convergence can also be found from the 
following formula: 


= lim 


1 




if the limit exists. 


(10.4) 


Example 10.1.2. Consider the exponential function e x which, as we shall see, 
has a power series expansion 




By the ratio test, we have 
r(x) = lim 


x n+1 /(n+l)\ 

= lim |*| 

n\ 

= |*| lim 

l 

x n /n\ 

(n + 1)! 

n + 1 

n —hoo 

n —»oo 


= 0 


for all values of x. So, regardless of x, the series representation of e x converges, i.e., 
the radius of convergence is infinite. We can also use Equation (10.3) to calculate 
the radius of convergence 


r = lim 

n —»oo 


Cn+1 


= lim 

n —>oo 


l/n! 


1 /{n + 1)! 


= lim \n + 1| = oo. 


Example 10.1.3. Let us find the interval of convergence of 

(~l) k x k 


E 


4 k (k+ 1)' 
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The ratio test gives 


fk+i(x) 


(- 1 ) 


k+1 x k+1 /r A k + 1 


/[4 fc+1 (k + 2)] 


tllli 

k — »oo 

fk(x) 

- 11111 

k — »oo 

(— l) k x 

lim 

x(k + 1) 

1 X 1 . 

lim 

k + 1 

k — >oo 

4(fc + 2) 

1 4 1 fc 

—>oo 

k + 2 


So, the series converges if r(x ) < 1, i.e., if |.x| < 4, or —4 < x < 4. 
What about the end points? For x = 4, the series becomes 


f* (-D fc 
to fe + 1 

which is the alternating series and it converges. On the other hand, if x = —4, the 
series becomes 

y (—l) fc (—4) fc y (_l) fc (_l) fc “ 1 

^ 4 k (k + 1) ~ ^ k + 1 + 1’ 

fc=0 v ' k=0 k =0 

which is the divergent harmonic series. So, the interval of convergence of the series 
is —4 < x < 4, and its radius of convergence is r* = 4. ■ 


Because of the uniform convergence of power series, we can perform all 
the common operations used for ordinary functions on the power series. We 
list all these properties in the following: 


► Continuity. A power series represents a continuous function within its 
radius of convergence; i.e., if r* is the radius of convergence, then the 
series 

OO 

f(x) = '^^ c n (x — a) n for a~r*<x<a + r* (10.5) 

71=0 

is continuous. 


a convergent 
power series 
represents a 
continuous 
function 


► Integration. The power series (10.5) can be integrated term by term 
within its radius of convergence; i.e., for a — r* <p<q<a + r*, 


rQ °° rq 

/ f(t) dt = ^2c n ( t-a) 

J P 71=0 J P 




71=0 


n + 1 


( 10 . 6 ) 


a convergent 
power series can 
be integrated term 
by term 


► Differentiation. The power series (10.5) can be differentiated term by 
term within its radius of convergence; that is, 

OO 

f\x) = nc n (x — a) n ~ 1 , a — r* < x < a + r*. (10-7) 

71=1 


a convergent 
power series can 
be differentiated 
term by term 


► Zero Power Series. If a power series has nonzero radius of convergence 
and has a sum which is identically zero, then every coefficient of the 
series must be zero. This leads to the following 


if two power series 
are equal, so are 
their correspond¬ 
ing coefficients 
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Theorem 10.1.4. If two power series Cn ( x — a ) n anc ^ o^ n 

(x — a) n have nonzero convergence radii and have equal sums whenever 
both series converge, then the two series are identical, i.e., 
c n — b n , n — 0,1,2,.... 

This property is very effectively used to find solutions of differential 
equations in terms of infinite power series. 


10.1.1 Taylor Series 

A power series whose coefficients are derivatives of the function representing 
the sum is called Taylor series. More precisely, let 


Taylor series 


f(x) = Y^c n (x - a) r 


a — r < x < a + r . 


( 10 . 8 ) 


n—0 


This series is called the Taylor series of f(x) a,t x = a if the coefficients c n are 
given by the rule: 


f( , /'(«) /» 
co = f{a), ci — , , c 2 = 


1! 


2 ! 


Cfc — 


f (k \a) 
k\ ’ 


so that 


/>), 

1 ! 


f(x) = f(a) + ^(x-a) + --- + ^fl {x - a) K + . 




k\ 


= E 


k =0 


/ (fe) (a) 

k\ 


(x — a) k where = f(a), 0! = 1. (10.9) 


Taylor series and 

approximating 

functions 


From Theorem 10.1.4 and the equality of (10.8) and (10.9), we conclude that 
every power series with nonzero convergence radius is the Taylor series of the 
function denoting its sum, and conversely every infinitely differentiable func¬ 
tion can be represented by a Taylor series within the interval of convergence 
of the series. 

An alternative way of writing the Taylor series which suggests approxima¬ 
tion is to let Ax = x — a. Then Equation (10.9) becomes 

f(a + Ax) = f(a) + ^Ax + ••• = £ l-^(Ax) k . 

k —0 

Since a is an arbitrary real number, we can replace it with x which is more 
suggestive of the generality of this formula: 

f{x + Ax) = f(x) + ® Ax + ••• = £ Ax) fc . (10.10) 

k -0 

With Ax interpreted as the increment in x, Equation (10.10) states that the 
function at the incremented value of x is /(x) plus a “correction” involving 
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all powers of Ax. The smaller the increment, the smaller the number of terms 
of the correction we need to keep to achieve a given accuracy. 

A convenient value for a is 0, in which case the series is called Maclaurin 
series: 


/(*) = /(0) 


m 

1! ' 


^ / (fc) (% k 


k—0 


k\ 


( 10 . 11 ) 


10.2 Series for Some Familiar Functions 

In this subsection, we give the Maclaurin series representation of a few familiar 
functions. These representations are so useful that the reader is urged to 
commit them to memory. 


The Exponential Function 


For e x , the derivatives of all orders are e x implying that /(")(0) = 1 for all n. 
Therefore, 


e x = 1 + 


1! 


x 

21 


OO 


E 

n=0 


X 


n 


n! 


(10.12) 


Maclaurin series of 

exponential 

function 


This series converges uniformly for all x as we saw in Example 10.1.2. 


The Trigonometric Functions 

The sine function has the following derivatives: 

f'(x) = cos x, f"(x) = — sin x, f'”(x) = — cos x, f^ v \x) = sin x, _ 

This can be summarized as 


&\x) 


(—l) n / 2 sin x if n is even, 
(—1)(" -1 )/ 2 cos a: if n is odd. 


Evaluating at x = 0 for the Maclaurin series yields 


Maclaurin series of 

trigonometric 

function 


SO that 


f {n) ( 0) 


0 if n is even, 

(—1)(" -1 )/ 2 if n is odd, 


X 3 x 5 

sinx = 0 + x — 0——7 + 0+ — 
3! 5! 


Ei- 1 )* 

k—0 


x 2k+1 
(2k + 1)!' 


(10.13) 


The combination 2fc+1 ensures that only odd terms are included even though 
there is no restriction on the sum over k. The radius of convergence is 


r 


* 


lim 

k —too 


(—l) fe / (2k + 1)! 
(—l) fc+1 / (2k + 3)! 


lim 

k —too 


(2k + 3)! 
(2k + 1)! 


— oo. 
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Maclaurin series of 
binomial function 


Thus the Taylor series representation of the sine function is convergent for 
all x. 

The Maclaurin series representation of the cosine function can be obtained 
similarly. We leave the details to the reader, and simply quote the result: 


cos a: = 


y>i)‘ 

fc=o 


x 2k 

(2fcj!’ 


— 00 < X < 00 . 


(10.14) 


The Binomial Function 


Another useful function which is used extensively in physics is the binomial 
function with arbitrary exponent, i.e., (1 + x) a with a an arbitrary real num¬ 
ber. It is easy to find the nth derivative of this function: 

f^ n \x) = a(a — l)(cc — 2) • • • (a — n + 1)(1 + x) a ~ n , n > 1. 

Evaluating this at x = 0 gives 

f( n \ 0) a(a — 1) • • • (a — n + 1) 


Cr>, — 


n > 1. 


n! n! 

From this, we can immediately find the radius of convergence: 

I a(a — 1) • • • (a — n + 1) (n + 1)! 


r* = lim 

n—> oo 


= lim 


Cn+1 
n + 1 


= lim 

n—* oo 

= 1. 


n\ 


a 


(a — 1) • • • (a — n) 


n—► oo | a — n | 

Thus, the series is convergent for — 1 < x < 1, and we can write 

(1 + x) a = 1 + y ~ ~ n + ^ x n , -l<x<l. (10.15) 

n\ 

n= 1 

Example 10.2.1. Because of the frequent occurrence of the square root, we work 
through the calculation of (10.15) for a = For a = +1, we have 

vrnE = ( i + xr = i + f:i^ 1) - ( l- n+1) -» 


— i+ \x +y;(-i) 


n\ 

n= 1 

_i 1 • 3 • 5 • • • (2n — 3) 


2"n! 


Now let n = m + 1 and rewrite the sum as 

rr ~7 ~ , , i , 1'3 • 5 • • • (2m — 1) m+1 

VTTx = i + 2 * + (-1) 2m+ i (m + i)! a 


m=l 

= 1 + h x ~ F 2 + s * 3 -' 


(10.16) 
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The case of a = — \ can be handled in exactly the same way. We simply quote 
the result 


i 00 


1 • 3 • 5 • • • (2m — 1) 


\/ l x 

and urge the reader to fill in the details. 


2 m ml 


- 1 _ I x -L. 3 2 _ 16 3.. . 

^ 8 x 48 ^ ’ 


(10.17) 


It is important to note the limitations of the power series representation 
of a function: Although (1 + x) a is defined for all positive 1 values of x, the 
power series representation of it is good only for a limited region of the real 
line. 

In many applications, the binomial function appears in the form (u + v) a 
where |i>| < |«| and one is interested in the power series expansion in v/u. 
This is easily done: 


(« + »)“-{« (i + |)}° = “"(i + ;) 

E a(a — 1 ) • • • (a — n + 1 ) fv 

n\ 

n= 1 

a(a - 1 ) • • • {a - n + 1 ) a _ r 


©’ 


=«•“+£ 


n! 


-u v , 


(10.18) 

— |u| < V < |u|. 


In practice, v is usually much smaller than u, and the requirement of conver¬ 
gence is overwhelmingly met. 


The Hyperbolic Functions 


The exponential function and the trigonometric functions have very similar 
power series: Except for (the crucial) coefficient (—l) fc , sin x appears to be 
the odd part of the expansion of e x and cos a: its even part. The (—l) fc factor 
makes the trigonometric functions periodic. What if we take this factor away, 
and simply collect the even powers of e x together and do the same to the 
odd powers? The resulting series will of course be (absolutely and uniformly) 
convergent because the exponential is so. So, let us introduce the following 
functions: 


00 x 2k+1 

sinh x = > -—— 

fcs< 2t + 1 > ! 

00 x 2k 

cosh x = > — -,, = 1 

h ( 2 « ! 





(10.19) 


Maclaurin series of 

hyperbolic 

functions 


1 It is really defined for more than just positive values. For instance, if a is an integer, 
the function is defined for all values of x. For fractional powers such as a = 1/2, 1 + x 
cannot be negative, so that we must restrict the values of x to x > —1. 
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sinh x (pronounced “sinch”) is called the hyperbolic sine function. Similarly, 
cosh a: (pronounced “kahsh”) is called the hyperbolic cosine function. By 
their very definition, we have 

e x — cosh x + sinh x. 


If we change x to —x, and note that sinh a; is odd and cosh a; is even, we can 
also write 

e~ x = cosh(— x) + sinh(—x) = cosh a; — sinh x. 

Adding and subtracting the last two equations yields 

e® + e~ x e x - e~ x 

cosha:=---, sinh x = ---. (10.20) 

This is how the hyperbolic functions are usually defined. From these defini¬ 
tions, one can obtain a host of relations for the sinh and cosh that look similar 
to the relations satisfied by sine and cosine. For example, it is easy to show 
that 

coslr 2 x — sinh 2 x = 1, —— cosh x = sinh x, — sinh x = cosh x, 

ax ax 

cosh(ai ± y) = cosh x cosh y ± sinh x sinh y, (10.21) 

sinh(a; ± y) = sinh x cosh y ± cosh x sinh y, 

cosh(2a:) = cosh 2 x + sinh 2 x, sinh(2x) = 2 sinh x cosh x. 

We give the derivation for the hyperbolic cosine of the sum, leaving the rest 
of them as problems for the reader. We start with the RHS: 


cosh x cosh y + sinh x sinh y 


[e x + e~ x )(e y + e~ v ) + (e x - e~ x )(e y - e~ y ) 


oX+y _|_ e x-y _|_ e ~x+y _|_ g —x—y _|_ e x+y _ e x-y _ e ~x+y _|_ g —x—y 


2e x + y + 2e~ x ~ y e x+y + e~ x ~ v 


= cosh(a; + y). 


4 2 

We can also define the analogs of other trigonometric functions: 


tanha’ = 
sech x 


- ill x e x — e x 


cosh a: e x + e~ x 

1 2 


, cosh a: e x + e x 
cotha; = -r-;— = —;-—, (10.22) 


cosech x 


- n l x e x — e 
1 2 


sinh x e x — e x 


cosh a: e x + e~ x 
These functions have such properties as 

sech 2 x = 1 — tanh 2 x, cosech 2 x = coth 2 x — 1, 


— tanh x = sech 2 x, — coth x = — cosech 2 x. 
ax ax 


and 
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The Logarithmic Function 

Finally, we state the Maclaurin series for ln(l + x), which occurs frequently 
in physics, and which the reader can verify: 

OO ^ 

ln(l + a:) = ^(-l) n+1 —, -1 <x<1. (10.23) 

n—1 " 


10.3 Helmholtz Coil 

Power series are very useful tools for approximating functions, and the closer 
one gets to the point of expansion, the better the approximation. The essence 
of this approximation is replacing the infinite series with a finite sum, i.e., 
approximating the function with a polynomial. 

In general, to get a very good approximation, one has to retain very large 
powers of the power series. So, the approximating polynomial will have to 
be of a high degree. However, suppose that a function /( x) has the following 
expansion 


/( x) = c 0 + ci (a; - a) H-b c m ( x - a) m + c m (x - a) m+k H-, 

where k is a fairly large number. Then the polynomial 

p(x) = c 0 + ci (a: — a) + ■ ■ ■ + c m (x - a) m 

approximates the function very accurately because, as long as we are “close” 
enough to the point of expansion a, the next term in the series will not affect 
the polynomial much. In particular, if the series looks like 

f(x) = co + c k (x - a) k -, (10.24) 

then the constant “polynomial” Co is an extremely good approximation to the 
function for values of x close to a. 

The argument above can be used to design devices to produce physical 
quantities that are constant for a fairly large values of the variable on which the 
outcome of the device depends. A case in point is the Helmholtz coil, which 
is used frequently in laboratory situations in which homogeneous magnetic 
fields are desirable. 

Figure 10.1 shows two loops of current-carrying wires of radii a and b 
separated by a distance L. We are interested in the ^-component of the 
magnetic field midway between the two loops, which, to simplify expressions, 
we have chosen to be the origin. Example 4.1.4 gives the expression for the 
magnetic field of a loop at a point on its axis at a distance 2 from its center. 


Maclaurin series of 

logarithmic 

function 




292 


Application of Common Series 



Figure 10.1: Two circular loops with different radii producing a magnetic field. 


Let us denote the magnetic field of the loop of radius a by B\ and that of the 
loop of radius & by f? 2 - Then Example 4.1.4 gives 


B(z) = B 1 {z) + B 2 (z) = 


2'Kk m Iia 2 


2tt kmhb 2 


[a 2 + (z + L/ 2) 2 ] 3 / 2 [b 2 + (z-L/ 2) 2 ] 3 / 2 

I6irk m lia 2 16xk m I 2 b 2 


[4a 2 + (2z + L) 2 ] 3 / 2 [4b 2 + {2z~ L) 2 } 3 / 2 ’ 


(10.25) 


We want to adjust the parameters of the two loops in such a way that the 
magnetic field at the origin is maximally homogeneous. This can be accom¬ 
plished by setting as many derivatives of B(z) equal to zero at the origin as 
possible, so that the Maclaurin expansion of B{z) will have a maximum num¬ 
ber of consecutive terms equal to zero, i.e., we will have an expression of the 
form (10.24). 

The first derivative of B(z) is 


dB 96nk m Iia 2 (2z + L) 96nk m l2b 2 (2z — L) 

dz [4a 2 + (2z + L) 2 ] 5 / 2 [4& 2 + (2z — L) 2 ] 5 / 2 

Setting this equal to zero at z = 0 gives 

ha 2 I 2 b 2 

(4a 2 + L 2 ) 5 / 2 ~ (4 b 2 + L 2 ) 5 / 2 ' 

The second derivative of B(z) is 


(10.26) 


d 2 B 768nk m Iia 2 [a 2 — (2 z + L) 2 ] 768irk m l2b 2 [b 2 — (2z — L) 2 ] 

~d^ = [4a 2 + (2z + L) 2 ] 7 / 2 [46 2 + (2z - L) 2 ] 7 /2 


Setting this equal to zero at 2 = 0 gives 


ha 2 (a 2 -L 2 ) I 2 b 2 (b 2 -L 2 ) 

(4a 2 + L 2 ) 7 ! 2 + (4& 2 + L 2 ) 7 /2 


(10.27) 


Since both terms are positive, the only way that we can get zero in (10.27) 
is if each term on the LHS vanishes. It follows that a = L = b. Substituting 
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this in Equation (10.26) gives I\ = I 2 which we denote by I. Therefore, we 
can now write the magnetic field as 

B{z) = 16tt kmla 2 { 1 -+ 1 - 2 x 7277 1 . (10.28) 

[ [4a 2 + (2z + a ) 2 ] 3 / 2 [4a 2 + (2z - a ) 2 ] 3 / 2 J 

The reader may verify that not only are the first and the second derivatives 
of B(z) of Equation (10.28) zero, but also its third derivative. In fact, we have 


B(z) 


32Trk m I 46087T k m I 4 
5 Vba 625 V$a 5 * 


(10.29) 


That only even powers appear in the expansion (10.29) could have been antic¬ 
ipated, because (10.28) is even in 2 as the reader may easily verify. It follows 
from Equation (10.29) that B(z) should be fairly insensitive to the variation 
of 2 at points close to the origin. Physically, this means that the magnetic 
field is fairly homogeneous at the midpoint between the two loops as long as 
the loops are equal and separated by a distance equal to their common radius, 
and as long as they carry the same current. Figure 10.2 shows the plot of the 
magnetic field as a function of 2. Note how flat the function is for even fairly 
large values of 2. 



Figure 10.2: Magnetic field of a Helmholtz coil as a function of 2. The horizontal axis 
is 2 in units of a. 


One of the problems faced by mathematicians of the late seventeenth and early eigh¬ 
teenth centuries was interpolation (the word was coined by Wallis) of tables of values. 
Greater accuracy of the interpolated values of the trigonometric, logarithmic, and 
nautical tables was necessary to keep pace with progress in navigation, astronomy, 
and geography. The common method of interpolation, whereby one takes the aver¬ 
age of the two consecutive entries of a table, is called linear interpolation because 
it gives the exact result for a linear function. This gives a crude approximation for 
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functions that are not linear, and mathematicians realized that a better method of 
interpolation was needed. 

The general method which can give interpolations that are more and more accu¬ 
rate was given by Gregory and independently by Newton. Suppose f(x) is a function 
whose values are given at a, a + h, a + 2 h, ..., and we are interested in the value 
of the function at an x that lies between two table entries. The Gregory—Newton 
formula states that 


f(a + r) = f(a) + -Af(a) + A 


(E-i) 

2 ! 


A 2 f(a) + 


(*-1)(k- 2) a3 


3! 


A f(a) + ■ 


where 


A/(a) = /(a + h)~ /(a), A 2 f{a) = A/(a + h) - A/(a), 

A 3 /(a) = A 2 /(a + ft) - A 2 /(a), A 4 /(a) = A 3 /(a + h) - A 3 /(a),... 

To calculate / at any value 1 / between the known values, one simply substitutes y— a 
for r. 

Brook Taylor’s Methodus incrementorum directa et inversa, published in 1715, 
added to mathematics a new branch now called the calculus of finite differences, and 
he invented integration by parts. It also contained the celebrated formula known 
as Taylor’s expansion, the importance of which remained unrecognized until 1772 
when Lagrange proclaimed it the basic principle of the differential calculus. 

Brook Taylor To arrive at the series that bears his name, Taylor let h in the Gregory-Newton 

1685-1731 formula be Ax and took the limit of smaller and smaller Ax. Thus, the third term, 

for example, gave 

r(r - Ax) A 2 /(a) rf „ 

2 ! Ax 2 2! J [ J 

which is the familiar third term in the Taylor series. 

In 1708 Taylor produced a solution to the problem of the center of oscillation 
which, since it went unpublished until 1714, resulted in a priority dispute with 
Johann Bernoulli. 

Taylor also devised the basic principles of perspective in Linear Perspective 
(1715). Together with New Principles of Linear Perspective the first general treat¬ 
ment of the vanishing points are given. 

Taylor gives an account of an experiment to discover the law of magnetic attrac¬ 
tion (1715) and an improved method for approximating the roots of an equation by 
giving a new method for computing logarithms (1717). 

Taylor was elected a Fellow of the Royal Society in 1712 and was appointed in 
that year to the committee for adjudicating the claims of Newton and of Leibniz to 
have invented the calculus. 



10.4 Indeterminate Forms and L’Hopital’s Rule 

It is good practice to approximate functions with their power series repre¬ 
sentations, keeping as many terms as is necessary for a given accuracy. This 
practice is especially useful when encountering indeterminate expressions of 
the form jj. Although L’Hopital’s rule (discussed below) can be used to find 
the ratio, on many occasions the substitution of the series leads directly to 
the answer, saving us the labor of multiple differentiation. 
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Example 10.4.1. Let us look at some examples of the ratios mentioned above. 
In all cases treated in this example, the substitution x = 0 gives 2, which is inde¬ 
terminate. Using the Maclaurin series (10.12) and (10.13), we get 


lim 

x —* 0 


2e x - 2 - 2x - x 2 


sin x — x 


= lim 

x —>0 


= lim 


2(1 + x + x 2 /2 + x 3 /6 + z 4 /24 + •••)- 2 - 2x - x 2 
x — x 3 /6 + x 5 /120 + ■ ■ ■ — x 


x 3 /3 + x 4 /12 + ■ 


= lim 


1/3 + as/12 + ■ 


= — 2 . 


x ^0 - x 3 / 6 + x s / 12 o- x^o -1/6 + x 2 /120 - 

The series (10.14) and (10.23) can be used to evaluate the following limit: 

ln(l + x) — x 
lim —---- 

x^0 COS x — 1 

x — x 2 /2 + * 3 /3 — • • • — x 


= lim 


i “o l-x?/2 + x 4 /24 - ■ 
-x 2 /2 + x 3 /3 - 


- 1 


= lim 


x—>o —x 2 /2 + x 4 /24 — ■ 
With (10.12) and (10.15), we have 
\/l + 2x — x — 1 


— 1/2 + x/3 — ■ ■ 
x“o —1/2 + x 2 /24 — ■ 


= lim 


= 1 . 


lim 

x —»-0 


= lim 

x —.0 


e*' - 1 

1 + 4 ( 2 ®) + ■ 


( 2*) 2 + ■ 


'-(2x) 3 + - 


— x — 1 


= lim 


-x 2 /2 + x 3 /2 + ■ 
o x 2 + x A /2 + ■ ■ ■ 


1 + x 2 + {x 2 ) 2 /2\ + -1 

—1/2 + x/2 + • ■ 


= lim 


o 1++72 + . 


The method of expanding the numerator and denominator of a ratio as 
a Taylor series is extremely useful in applications in which mere substitution 
results in the indeterminate expression 2 of the form 2. However, there are 
many other indeterminate forms that occur in applications. For example, a 
mere substitution of x = 0 in (l + x) 1 /* yields I 00 which is also indeterminate. 

Other examples of indeterminate expressions are 0 x 00 , —, 0°, and oo°. Most 
of these expressions can be reduced to indeterminate ratios for which one can 
use l’Hopital’s rule: I ' HopitaI's rule 


Box 10.4.1. ( L’Hopital’s Rule). If f(a)/g(a) is indeterminate, then 

(10.30) 


lim M = lim f(l) 


g(x) x~a g'(x) ’ 
where f and g' are derivatives of f and g, respectively. 


2 An expression is indeterminate if it involves two parts each of which gives a result that 
is contradictory to the other. Thus the numerator of the ratio ^ says that the ratio should 
be zero, while the denominator says that the ratio should be infinite. 
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In practice, one converts the indeterminate form into a ratio and differen¬ 
tiates the numerator and denominator as many times as necessary until one 
obtains a definite result or infinity. The following general rules can be of help: 

• If /(a) = 0 and g(a) = oo, then to find lim x -> a f(x)g(x), rewrite the 
limit as 


lim f(x)g{x ) = lim 


lim - 

x—>a 

l 

or lim f(x)g{x) = lim - 

x—>a x—>a 

& 

l 


.50). 


J 0). 


the first of which gives ^ and the second In either case, one can 
apply L’Hopital’s rule. 

• If /(a) = 1 and g{a) = oo, first define h(x) = [/(x)] 9 ^. Then to find 

lim h(x) = lim [/0)]^ x \ 

x—>a x—>a 


take the natural logarithm of h{x) and convert the result into the ratio 


lim ln[/i(a;)] = lim g(x)\n[f(x)] = lim ^ ^ 


50 ) 


Then use Equation (10.30). 

• If /(a) = oo (or /(a) = 0) and g(a) = 0, then to find 

lim h(x) = lim [/0)0 X \ 

x—>a x—>a 


take the natural logarithm of h{x) and convert the result into the ratio 


lim ln[/i(a;)] = lim g(x)ln[f(x)} = lim ^ ^ 


5 O) 


Then use Equation (10.30). 

Example 10.4.2. To find the lima;^o(l + 2x) 1 ^ x , we write h(x) = (l + 2x) 1 ^ x and 
note that 

lim ln[h(*)] = lim(l/x) ln(l + 2x) = lim T ^ x ) 

x—*0 x —>0 x—>0 X 

is indeterminate. Using Equation (10.30) yields 


lim lii[/i(*)] = lim 

x —*-0 x —.0 


ln(l + 2x) 

x 


lim 

x —>0 


2 

l + 2x 
1 


lim - 

x —>0 1 + 2 * 


= 2 . 


Therefore, lim x _>o h(x) = e 2 . 

To find the lin+^oa^, we write h(x) = x x and note that 


lim ln[/i(*)l = lim *ln* = lim ^0“ 

x —>0 x~0 x^O \/X 
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is indeterminate. Using Equation (10.30) yields 
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lim MM*)] = lim , , = lim (-a:) = 0. 

x—>0 x —* 0 — L/X z x — * 0 

Therefore, lim ar _>o h(x) = e° = 1. So, we have the interesting result lim^^o x x = 1. 
The limit of x 2 /(l — cos*) as x goes to zero is obtained as follows: 

lim --- = lim —— = lim = 2. 

*-► ol — cos* x—>o sin * cos* 

Here we had to differentiate twice because the ratio of the first derivatives was also 
indeterminate. 


It is instructive for the reader to verify all limits in Example 10.4.1 using 
L’Hopital’s rule to appreciate the ease of the Taylor expansion method. 


10.5 Multipole Expansion 

One extremely useful application of the power series representation of func¬ 
tions is in potential theory. The electrostatic or gravitational potential can 
be written as 


$(r) = K 


(10.31) 


where K is k e for electrostatics and — G for gravity. Similarly, Q represents 
either electric charge or mass. In some applications, especially for electrostatic 
potential, the distance of the Held point P from the origin is much larger than 
the distance of the source point P' from the origin. This means that r » r' 
and we can expand in the powers of the ratio r'/r which we denote by e. The 
key to this expansion is a power series expansion of 1/ |r — r'|. First write 


r ' 2 — 2r • r' r\J 1 + e 2 — 2ee r • e r 


— — (1 T e — 2ee r • e r ') 


Next use the binomial expansion (10.15) with x = e 2 — 2ee r • e r > and a = — 5 . 
Up to second order in e, this yields 


•j^l — i (e 2 — 2ee r • + |] (e 2 — 2ee r • e r ') + • • • ^ 


= ({1 


1 e r ■ r' 


V + f 2 [-5 + f( e r • e r') 2 ] + 

TT \~l + I(®r • ®r') 2 ] H- 


(10.32) 
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electric dipole 
moment defined 


dipole 

approximation 


Substituting this in Equation (10.31), we obtain 


$(r) = dQ{ r') + ^ e r -J r ' dQ(r') 

H—t If r'" \—\ + §(e r • e r ') 2 ] dQ{r') + • • • (10.33) 

r JJn 

= ~~ + • P Q + ^3 J^r 12 [~\ + |(e r • e r /) 2 ] dQ(r') H-, 


where 

Q=J dQ( r') 

is the total Q (charge, or mass)—also called the zeroth Q moment —and 


P Q =J n r' d Q(r') (10.34) 

is the first Q moment, which in the case of charge is also called the electric 
dipole moment. One can also define higher moments. 

If the source of the potential is discrete, the integral in Equation (10.31) 
becomes a sum. The steps leading to (10.33) will not change except for switch¬ 
ing all the integrals to summations. In particular, the dipole moment of N 
point sources {Qk\k=i^ located at {rfc}jJL 1 , turns out to be 

N 

P Q = E Q k r k . (10.35) 

fc =1 


For the special case of two electric charges qi = +q and q 2 = —q, we obtain 3 

p = qri - qr 2 = g(ri - r 2 ). (10.36) 

Thus, the dipole moment of a pair of equal charges of opposite sign is the 
magnitude of the charge times the displacement vector from the negative to 
the positive charge. 

Example 10.5.1. Electric dipoles are fairly abundant in Nature. For example, 
an antenna is approximated as a dipole at distances far away from it; and in atomic 
transitions one uses the so-called dipole approximation to calculate the rate of 
transition and the lifetime of a state. 

Let us write the explicit form of the potential of a dipole, i.e., the second term on 
the RHS of Equation (10.33). In Cartesian coordinates, in which the dipole moment 
is in the ^-direction (so that p = pe z ), the potential can be written as 


3 It is customary to denote the electric dipole moment by p with no subscript. 
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. , he ~ he 

<&di P (x,y,z) = —e r ■ p = — r ■ p = 


k e pz 


(® 2 + y 2 + z 2 ) 3 / 2 ' 

More important is the expression for potential in spherical coordinates: 

cos 9 


heP ' 


k e p 


$dip (r, 0, p) = ^ e r ■ e z = ^ cos 6. 


(10.37) 


The azimuthal symmetry (independence of ip) comes about because we chose p to 
lie along the 2 -axis. 


10.6 Fourier Series 

Power series are special cases of the series of functions in which the nth func¬ 
tion is (x — a) n —or simply x n —multiplied by a constant. These functions, 
simple and powerful as they are, cannot be used in all physical applica¬ 
tions. More general functions are needed for many problems in theoretical 
physics. 

The most widely used series of functions in applications are Fourier series 
in which the functions are sines and cosines. These are especially suitable for 
periodic functions which repeat themselves with a certain period. Suppose 
that a function f{x) is defined in the interval (a, 6). Can we write it as a 
series in sines and cosines, as we did in terms of orthogonal polynomials [see 
Theorem 7.5.2]? Let L = b — a denote the length of the interval, and consider 
the functions 

. 2mrx 2mrx 

sm—-—, cos—-—. 

Let us try the series expansion 

OO 

f{x) = a 0 + ^2 

n =1 

where we have separated the n = 0 term. Now the sine and cosine terms have 
the following easily obtainable useful properties: 


2mrx 


a n cos ■ 


. 2mrx \ 
1 Sm ~L ) 


(10.38) 


’ . 2mrx 
sm —-— ax = 


1 2nirx , 
cos —-— ax = 


2mrx 2rmrx 
sm —-— cos —-— ax = 0, 


’ . 2mrx . 2rmrx 
sm —-— sm —-— ax = 


’ 2mrx 
cos —-— cos 


L 
2mnx 


dx = 


L 

0 if m ^ n, 

L/2 if m = n/ 0, 

[o if to ^ n, 

| L/2 if m = n/ 0. 


(10.39) 


electric potential 
of a dipole 


periodic functions 


Fourier series 
expansion 
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expansion of 
periodic functions 
in terms of Fourier 
series 


These properties suggest a way of determining the coefficients of the series 
for a given function as in the case of orthogonal polynomials. If we integrate 
both sides of Equation (10.38) from a to 6, we get 4 


r° r° r b _ . / 

/ f(x) dx = a 0 dx+ y f, 
Ja J a Ja n= i \ 

oo 

= (b — a)a 0 + c 


a n cos 

b 


2mrx . 2mrx\ 

+ b n sin 


L 
2mrx 


cos ■ 


dx + b n 


L J 

rb 


dx 

2nirx 


sin ■ 


dx 


n— 1 


n=l 


=0 


=0 


or y f(x) dx = doL. This yields 


1 f b 

a ° = L J dx ' 


(10.40) 


Multiplying Equation (10.38) by cos(2rmrx / L) and integrating both sides from 
a to 6, we obtain 


f(x) cos 


2m7rx 


dx 


f 2m7rx 
= ciq I cos —-— dx - 


r b OO 

E 

1 n=l 


2mrx 


a n cos ■ 


+ b n sin 


. 2mrx \ 2rmrx 


l ; 


cos ■ 


dx 


= °+E 


n =1 


2mrx 2rmrx , v-^ 

a n I cos —-— cos —-— dx + > o„ 

n= 1 


L L 

= djjiL / 2, 

where we used Equation (10.39). This yields 


J . 2n7TX 2 tyi'kx 
sin —-— cos —-— dx 


2 f b f( x 2?i7ra: 

= ~L J ' ^ C ° S ~~L~ dX ' 


(10.41) 


Similarly, multiplying both sides of Equation (10.38) by sm(2rnnx/L) and 
integrating from a to b , yields 


; 2 f b f( s ■ 2n7rx i 

b n = — J J(x) sin —j— dx. 


(10.42) 


Fourier series 
always represents 
a periodic 
function. 


Equations (10.38), (10.40), (10.41), and (10.42) provide a procedure for 
representing a function f(x) as a Fourier series. However, the RHS of Equation 
(10.38) is periodic. This means that for values of x outside the interval (a, b), 
/( x) is also periodic. In fact, from Equation (10.38), we have 

4 Here we are assuming that the series converges uniformly so that we can switch the 
order of integration and summation. This assumption turns out to be correct, but we shall 
forego its (difficult) proof. 
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f(x + L) — ao + 


n— 1 
oo 

= d 0 + Y^ 

n =1 
oo 

= °o + 

n= 1 


2mr(x + L) . 2mr(x + L) 
a n cos---1- b n sin 


a n cos 


L 

( 2mrx 


a n cos ■ 


V L 
2mr x 


+ 2nir j + b n sin 
2mrx\ 


L 

( 2mrx 


+ b n sin ■ 


L J 


V L 

= fix). 


2mr 


Thus, f{x) repeats itself at the end of each interval of length L , i.e., it is pe¬ 
riodic with period L. Fourier series is especially suited for representing such 
functions. In fact, any periodic function has a Fourier series expansion, and 
the simplicity of sine and cosine functions makes this expansion particularly 
useful in applications such as electrical engineering and acoustics where peri¬ 
odic functions in the form of waves and voltages are daily occurrences. Let 
us look at some examples. 5 

Example 10.6.1. In the study of electrical circuitry, periodic voltage signals of 
different shapes are encountered. An example is the so-called square wave of height 
Vo, and duration and “rest duration” T [see Figure 10.3(top)]. The potential as a 
function of time, V(t), can be expanded as a Fourier series. The interval is (0, 2 T), 
because that is one whole cycle of potential variation. We thus write 

( 2nnt , 2nnt\ . 

V{t) = ao + 2^ ( a n cos—— +b n sm— — ) (10.43) 

n= 1 ' ' 



Figure 10.3: Top: The periodic square-wave potential with Vo = 1 and T — 2. 
Bottom: Various approximations to the Fourier series of the square-wave potential. The 
dashed plot is that of the first term of the series, the thick gray plot keeps 3 terms, and 
the solid plot 15 terms. 

5 While Taylor series expansion demands that the function be (infinitely) differentiable, 
the orthogonal polynomial and Fourier series expansion require only piecewise continuity. 
This means that the function can have any (finite) number of discontinuities in the interval 
(a, b). Thus, the expanded function can not only be nondifferentiable, it can even be 
discontinuous. 


square wave 
potential 
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Gibb's 

phenomenon 
sawtooth potential 


with 


1 f 2T 

-wl v( ‘> 

2_ r 2 

~ 2Tio 


dt, 


V ( t ) cos 


2nnt 
2 T 


■ dt = 




x . mvt , 
V(t) cos — dt, 


Vo if 0 < t < T, 

0 if T <t < 2T, 


, 1 f 2T ,,/ , . Tint , 

bn= T J 0 V ® Sm ~T~ dt ' 

Substituting 

V(t) = 

in Equation (10.44), we obtain 

a ° = ^T f V ° dt =k V ° 


Vo cos dt = 0, 


and 


bn — 


f 


. nirt Vo T mvt 

Vo sm — dt = ~ — — cos — 
1 1 mv 1 


= —(1 — cosnn) — — [1 — (—l) n l • 
mr nn 


(10.44) 


Thus, there is no contribution from the cosine sum, and in the sine sum only the 
odd terms contribute (b n = 0 if n is even). Therefore, let n = 2k + 1, where k now 
takes all values even and odd, and substitute all the above information in Equation 
(10.43), to obtain 


V(t) 


vo+j: 


Vo 


(2k + 1)7T 


1 - ( -l) 2k+1 


2k+l] ■ (2 k + l)7rf 


T 


Vo f 4 y, sin[(2fc + l)nt/T] 

2 1 7T ^ 2k + 1 

l k =0 


The plots of the sum truncated at the first, third, and fifteenth terms are shown 
in Figure 10.3(bottom). Note how the Fourier approximation overshoots the value 
of the function at discontinuities. This is a general feature of all discontinuous 
functions and is called the Gibb’s phenomenon. 6 > 


Example 10.6.2. Another frequently used potential is the sawtooth potential. 
The interval is (0, T) and the equation for the potential is 

V(t) = V 0 ^ for 0 <t<T. 

6 A discussion of Gibb’s phenomenon can be found in Hassani, S. Mathematical Physics: 
A Modern Introduction to Its Foundations , Springer-Verlag, 1999, Chapter 8. 
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The coefficients of expansion can be obtained as usual: 

r T 


and 


oo = ^J Fo|df=±F 0 


2 Vo T 


T 2 I 2mr 


t sin : 


2mrt J4 2 Vo 

T 


2nnt 

T T 

T 

0 2mr 


2nnt , 
t cos —at 


T 
2nnt 


dt>=0, 


bn — 


r/ v 4* 


2mvt , 

sm ——— at = 


T 


2V 0 

y2 


T 2nnt 
~2n4 t C ° S ~T~ 


2Vo [ 

T 2 Jo 

t r 

+ 2n7r Jo 


. 2nnt , 
t sm ——— at 


dt\ = — —. 


Substituting the coefficients in the sum, we get 


V (t) = \V 0 -Y. 


Vo . 2mvt Vo 

— sm ——— = — 

nn T 2 


T 

2nnt 
'' T 


2 v'' sin(2mvt/T) 




The plot of the sawtooth wave as well as those of the sum truncated at the first, 
third, and fifteenth term are shown in Figure 10.4. ■ 




Figure 10.4: Top: The periodic sawtooth potential with Vo = 1 and T = 2. Bottom: 
Various approximations to the Fourier series of the sawtooth potential. The dashed plot 
is that of the first term of the series, the thick gray plot keeps 3 terms, and the solid 
plot 15 terms. 


Although Euler made use of the trigonometric series as early as 1729, and d'Alembert 
considered the problem of the expansion of the reciprocal of the distance between 
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"The profound 
study of nature is 
the most fruitful 
source of 
mathematical 
discoveries." 
Joseph Fourier 



Joseph Fourier 
1768-1830 


two planets in a series of cosines of the multiples of the angle between the rays from 
the origin to the two planets, it was Fourier who gave a systematic account of the 
trigonometric series. 

Joseph Fourier did very well as a young student of mathematics but had set 
his heart on becoming an army officer. Denied a commission because he was the son 
of a tailor, he went to a Benedictine school with the hope that he could continue 
studying mathematics at its seminary in Paris. The French Revolution changed 
those plans and set the stage for many of the personal circumstances of Fourier’s 
later years, due in part to his courageous defense of some of its victims, an action 
that led to his arrest in 1794. He was released later that year, and he enrolled 
as a student in the Ecole Normale, which opened and closed within a year. His 
performance there, however, was enough to earn him a position as assistant lec¬ 
turer (under Lagrange and Monge) in the Ecole Polytechnique. He was an excellent 
mathematical physicist, was a friend of Napoleon, and accompanied him in 1798 to 
Egypt, where Fourier held various diplomatic and administrative posts while also 
conducting research. Napoleon took note of his accomplishments and, on Fourier’s 
return to France in 1801, appointed him prefect of the district of Isere, in south¬ 
eastern France, and in this capacity built the first real road from Grenoble to Turin. 
He also befriended the boy Champollion, who later deciphered the Rosetta stone 
as the first long step toward understanding the hieroglyphic writing of the ancient 
Egyptians. 

Like other scientists of his time, Fourier took up the flow of heat. The flow was 
of interest as a practical problem in the handling of metals in industry and as a 
scientific problem in attempts to determine the temperature at the interior of the 
Earth, the variation of that temperature with time, and other such questions. He 
submitted a basic paper on heat conduction to the Academy of Sciences of Paris 
in 1807. The paper was judged by Lagrange, Laplace, and Legendre, and was not 
published, mainly due to the objections of Lagrange, who had earlier rejected the 
use of trigonometric series. But the Academy did wish to encourage Fourier to 
develop his ideas, and so made the problem of the propagation of heat the subject 
of a grand prize to be awarded in 1812. Fourier submitted a revised paper in 1811, 
which was judged by the men already mentioned, and others. It won the prize but 
was criticized for its lack of rigor and so was not published at that time in the 
Memoires of the Academy. 

He developed a mastery of clear notation, some of which is still in use today. (The 
placement of the limits of integration near the top and bottom of the integral sign was 
introduced by Fourier.) It was also his habit to maintain close association between 
mathematical relations and physically measurable quantities, especially in limiting 
or asymptotic cases, even performing some of the experiments himself. He was 
one of the first to begin full incorporation of physical constants into his equations, 
and made considerable strides toward the modern ideas of units and dimensional 
analysis. 

Fourier continued to work on the subject of heat and, in 1822, published one of 
the classics of mathematics, Theorie Analytique de la Chaleur, in which he made 
extensive use of the series that now bears his name and incorporated the first part 
of his 1811 paper practically without change. Two years later he became secretary 
of the Academy and was able to have his 1811 paper published in its original form 
in the Memoires. 
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10.7 Multivariable Taylor Series 

The approximation to which we alluded at the beginning of this chapter is 
just as important when we are dealing with functions depending on several 
variables as those depending on a single variable. After all, most functions 
encountered in physics depend on space coordinates and time. We begin with 
two variables because the generalization to several variables will be trivial 
once we understand the two-variable case. 

A direct—and obvious—generalization of the power series to the case of a 
function f(u,v) of two variables about the point (uq,vq) gives 

f(u, v) = a 00 + a w (u - u 0 ) + a 0 i(v - v 0 ) + a 20 (u - u 0 ) 2 

+ a 02 (v - v 0 ) 2 + an (it - u 0 )(v - v 0 ) + a 3 o(u - u 0 ) 3 
+ a 2 i(it - ito) 2 (u - v 0 ) + au(u - w 0 )(f - v 0 ) 2 
+ ao3(u — Vo) 3 H-• (10.45) 

The notation used above needs some explanation. All the a’s are constants 
with two indices such that the first index indicates the power of (u — uo) and 
the second that of (v — Vo ). To obtain a Taylor series, we need to relate a’s 
to derivatives of /. This is straightforward: To find Ojy, differentiate both 
sides of Equation (10.45) k times with respect to u and j times with respect 
to v and evaluate the result at (u o, fo)- Thus, to evaluate aoo, we differentiate 
zero times with respect to u and zero times with respect to v and substitute 
uo for u and vq for v on both sides. We then obtain 

f(u 0 ) v 0 ) = aoo + 0 + 0 + -- - + 0 + -- - = aoo- 

By differentiating with respect to u and evaluating both sides at (uo,i’o), we 
obtain 

9if(uo, vo) = 0 + aio + 0 + -- -+ 0 + -- - = aio- 

Similarly, 


d 2 f{uo, i’o) — 0 + 0 + aoi + 0 + -- - + 0 + -- - — aoi, 
9i9i/(a 0 ,u 0 ) = dff(u 0 ,v 0 ) = 2 a 20 , 
d 2 d 2 f(uo,v 0 ) = d 2 f(u 0 ,v o) = 2a 02 , 
d2dif(uo,i’o) = an. 


We want to write Equation (10.45) in a succinct form to be able to extract 
a general formula for the coefficients. An inspection of that equation suggests 
that 

OO OO 

f{u,v) = ^^a ifc (w - u 0 ) J (v - v 0 ) k . 
j —0 k =0 
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Taylor series of a 
function of two 
variables 


Taylor series of a 
function of three 
variables 


It is more useful to collect terms of equal total power together. Thus, writing 
to = k + j, and noting that j cannot be larger than to, we rewrite the above 
equation as 


f(u,v) = "^2^2 a o,™-i ( u ~ u oY (v-v 0 ) m 3 . 

m=0 j —0 

Let us introduce the notation dk, n -k for k differentiations with respect to the 
first variable, and n — k differentiations with respect to the second: 7 

t _ d n f 

dk,n—kf = a j. o rt — k 
OIL OV 

and apply it to both sides of the sum above. Evaluating the result at (uq, Vq), 
we obtain 


OO 771 

dk,n-kf(uo,V 0 ) = EE ^j t m—j^k,n— k { (^ ^oY (T 

m=0 j —0 


v 0 ) m ~ 3 } 


{uo,Vo) 


If j < k or to — j < n — k then the corresponding terms differentiate to zero. 
On the other hand, if j > k or m — j > n — k then some powers of u — uq or 
v — Vq will survive and evaluation at (uo,vo) will also give zero. So, the only 
term in the sum that survives the differentiation is the term with j = k and 
to — j = n — k which gives k\(n — k)\. We thus have 


dk,n-kf{u 0 ,v 0 ) = k\(n - k)\a k ,n-k 


dk,n-kf(U0,V 0 ) 

afe ’ n - fc_ k\(n — k)\ 


and the Taylor series can finally be written as 


/(«,«) = EE 

77=0 k —0 


dk,n-kf(,U 0 ,V 0 ) 

k\(n — k)\ 


u 0 ) k (v 


v 0 ) n ~ k 


(10.46) 


Sometimes this is written in terms of increments to suggest approximation as 
in the single-variable case: 


OO 71 r\ p / \ 

/(« + A«, u + Ac) = J2 E ^Vn-kM ( Au)k{Av)n ~ k ’ (1 °- 47) 

where we used (u,v) instead of (uq - Vq). Once again, the first term in the 
expansion is f(u,v) and the rest is a correction. 

The three-dimensional formula should now be easy to construct. We write 
this as 8 


AA . , d^j k f(uo,vo,wo) , 

f(u,v,w) = ^2 E - H'lki - (u-u 0 y(v-v 0 Y(w-w 0 ) k . (10.48) 

71=0 i-\-j+k=n ^ 

7 This notation is not universal. Sometimes d^j is used with the understanding that 
k + j = n. 

8 The symbol represents the nth derivative with i differentiations with respect to the 

first variable, j differentiations with respect to the second variable, and k differentiations 
with respect to the third variable, such that i + j + k = n. 
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For a given value of n, suggested by the outer sum, the inner sum describes a 
procedure whereby all terms whose i,j, and k indices add up to n are grouped 
together. As a comparison, we also write Equation (10.46) in this notation: 

/(".•’) = £ £ (10-49) 

n=0 j-\-k=n 

The three-dimensional Taylor series in terms of increments becomes 


f(u + Am, v + Am, w + Aim) 


= E E 

n=0 i+j-\-k=n 


djj k f{u,v,w) 

i\j\k\ 


(Am) 1 (Am) j (A w) k 


(10.50) 


where again (uo,Vo,Wo) has been replaced by (u,v,w). 

Example 10.7.1. As an example we expand e 1 siny about the origin. 9 Using the 
notation in Equation (10.49), the coefficients, within a factor of j\k\, can be written 
as 


qn / X \ 

djk(e sin y) 


d n 


(o,o) dxWy k 

Qk 


d J 

I (o,o) dxi 


(e^siny) | — (e x ) | _ (siny) 


1=0 dy k 


y=o 


dy 


fc (siny) 


y=0 


The first few terms of the Taylor expansion of this function can now be written 
down: 


e sin y = y + xy + 


x 2 y 


+ _i_ 

6 6 + 6 


One could also obtain this result by multiplying the Taylor expansions of e x and 
siny term by term. ■ 


10.8 Application to Differential Equations 

One of the most powerful methods of solving an ordinary differential equation 
(ODE) is the power series method, and we shall use this method to solve some 
of the most recurring differential equations of mathematical physics in Chap¬ 
ters 25 through 27. Power series are uniformly and absolutely convergent, and 
can be differentiated term by term. This makes them a good candidate for 
representing the (unknown) solutions of differential equations. The relation 
among the derivatives, expressed in a differential equation, becomes a relation 
among coefficients of the power series, the so-called recursion relation, which 
is enough to determine all the relevant coefficients of the series, leaving only 
those coefficients which require initial conditions for their determination. The 
best way to understand the method is to look at an example. 


9 The use of x and y ill place of u and v should not cause any confusion. 
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recursion relation 


Example 10.8.1. The differential equation 

dx 

— = bx 
dt 

can be assumed to have a power series solution of the form 


x(t) = y c n t n . 

n =0 

This power series will be uniformly and absolutely convergent for some interval 
on the real line, and as such, can be differentiated. Differentiating the foregoing 
equation and substituting the result in the differential equation, we get 

OO OO 

y nc„t n_1 = b y c n t n . 

n= 1 n =0 

The essential property of power series is the equality of the corresponding coefficients 
when two such series are equal (see Theorem 10.1.4). Before using this property in 
the above equation, however, we need to reexpress the LHS so that the power of t 
is the same on both sides. We thus change the dummy index from n to m = n — 1, 
so that all n’s are replaced by m + 1. We then get 

OO OO 

LHS = y (m + l)cm+it m = y (m + l)c m+ it m . 

m-\-1=1 m =0 


Since we are free to use any dummy index we please, let us change m to n so that 
we can compare the two sides of the equation. This gives 

OO OO 

y (n + l)cn + if n = y bc n t n =+ (n + l)c n +i = bc„. (10.51) 

n=0 n=0 


We can immediately test for the convergence of the series using the ratio test: 


lim 

n —>oo 


Cn-\-lt 


n+l 


C n t n 


= lim 

n —»oo 


tbc n /(n + 1) 


= lim 

n —»oo 


bt 


n+l 


= 0 


for all b and t. Thus, regardless of the value of b and t, the series converges. 

We have established the convergence of the series representation of the solution 
of our differential equation. We now have to find the coefficients. This is done by 
rewriting Equation (10.51) as 


Cn+1 — 


n+l 


(10.52) 


which is called the recursion relation of the series. By iterating this relation we 
can obtain all the coefficients in terms of the first one as follows: 


Cn-t-l — 


n+l 

h2 


(n + l)n \ n — 1 


n+l \n 
b 

Cn-2 


b 

Cn—1 


(n + l)n 


Cn — 1 


(n + l)n(n — 1) 


Cn-2- 
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Since we are interested in finding c n , we can rewrite this equation as 

- & 3 

n(n — l)(n — 2) ™ 3 ’ 

where we have lowered all n’s on both sides by one unit. This relation can easily be 
generalized to an arbitrary positive integer j: 

_ _ v_ _ 

n(n — 1) • • • (n — j + 1) C ” J ’ 

In particular, if we set j = n, we obtain 


C "~ n(n-l)---2-l C0 ~ n! C ° 
which upon substitution in the original series, yields 


(10.53) 


n ' n\ 


■ coe 


where we have used Equation (10.12). The unknown Co is determined by the value 
of x(t) at a given t, usually t = 0. ■ 


There are of course much easier ways of solving the simple differential 
equation above, and the method used may appear to “kill a fly with a sledge¬ 
hammer.” Nevertheless, it illustrates the almost mechanical way of obtaining 
the solution without resorting to any “tricks” used so often in arriving at the 
closed-form solutions of differential equations. 


Example 10.8.2. Let us look at another familiar example. The motion of a mass 
m driven by a spring with spring constant k is governed by the differential equation 


d 2 x 

m—— = —kx 
at z 


d 2 x k 

-TP7 “I- x ~ 0- 

dt z m 


Once again we assume a solution of the form 


x(t) = ^2 a„t n = ao + ait + a 2 t 2 H-+ a n t n + • 


71=0 

and differentiate it twice to get 
dx _ 

71=1 


dt 


= ^2 na r it n 1 = ai + 2a 2 t + ■ ■ ■ + na n t n 1 + • • • , 


d x 
~dd? 


= ^22 n ( n ~ 1 )a n t n 2 = 2a 2 + 3 • 2aot + ■ ■ ■ + n(n — l)o„t n 2 + ■ 


Substitute j = n — 2 to bring the power of t into a form that can be compared with 
the RHS. This amounts to substituting j + 2 for all n’s: 

,2 00 00 

-j£T='52(j + 2 )C? + 1 )«f+2^ = ^(n + 2)(n+ l)a n+2 t n . 

j =0 71=0 




310 


Application of Common Series 


In the last step we simply changed the dummy index. Substituting this and the 
series for x(t) in the differential equation, we obtain 


cxj , oo 

y (n + 2 )(n + l)a„+ 2 1" + — y a nt n = 0 

n =0 ^ n =0 


which gives the recursion relation 


(n + 2 )(n + l)a n ^_2 + —— 0 
m 


dn+2 = — 


k/m 


(n + 2)(n + 1) 


(10.54) 


Application of the ratio test [as given by Equation (9.10) with j = 2] immediately 
yields that the series is convergent for all values of k/m and all values of t. If we 
lower the value of n by two units on both sides, we get 


k/m k/m 

&n — ; 7T &n — 2 — 


k/m 


n(n — 1) n(n — 1) ( (n— 2)(n — 3) 

(—k/m) 2 


0>n —4 


n(n — 1 )(n — 2)(n — 3) 
(—k/m) 2 


&n—4 


k/m 


n(n — l)(n — 2)(n — 3) ( (n — 4)(n — 5) 
(—k/m) 3 

n(n — l)(n — 2 )(n — 3 )(n — 4)(n — 5) 


(In—6 


dn—6 


n(n — 1) • • • (n — 2* + 1) n 2 *’ 

where 1 is some positive integer. Because of the form of this equation, we should 
consider two cases: For even n, we let 1 = n/2 or n = 21 to obtain 

(—k/m) 1 (—k/m) 1 

021 = 2i(2i-i)... 2 .i ao = yyr ao 

and for odd n we let 1 = (n — l)/2 or n = 21 + 1 to get 


a 2t+i 


(—k/m) 1 (—k/m) 1 

(21 + 1 ) 21 -•- 2 - I® 1 “ (21 + 1 )! ai 


Thus all even coefficients are given in terms of a o, and all odd ones in terms of a\. 
Absolute convergence of the series now allows us to rearrange terms and separate 
even and odd terms to write 


c(t) = E a nt n + E = E a2 .y + E a 2j + lt 2j + 1 

n=odd j =0 j =0 


n=eve n 


(-k/m) 3 _ 2 j , v 2 ' (-k/m) 3 _ 2 j+i 


"S - ®)! -001 + £WTI)T “ l1 


= ao 


E 

3=0 


(-1 ) J 

m 


2 j 


ml) + 


Ol 




(- 1) 2 


V fe / m + X ) ! 


m 1 


2j + l 


= A cos k/m tj+B sin k/m tj 
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where A — ao and B = a\jy/k/m are arbitrary constants to be determined by the 
initial conditions of the problem. The Maclaurin series for sine and cosine used 
above are given in Equations (10.13) and (10.14). g 

The examples above, although illustrating the utility of the power series 
method of solving differential equations, should not give the impression that 
one needs no other methods. The closed-form solutions are sometimes essen¬ 
tial for interpreting the physical properties of the system under consideration. 
For example, if the mass of the preceding example is in a fluid, so that a 
damping force retards the motion, the closed-form solution will turn out to 
be 

x (t) = Ae~ lf cos (cot + a), u> = 

where 7 is the damping factor and a is an arbitrary phase. Deciphering this 
closed form from its power series expansion, obtained by solving the differ¬ 
ential equation by the series method, is next to impossible. The closed-form 
solution shows clearly, for instance, how the amplitude of the oscillation de¬ 
creases with time, an information that may not be evident from the series 
solution of the problem. Nevertheless, on many occasions, a closed-form so¬ 
lution may not be available, in which case the power series solution will be 
the only alternative. In fact, many of the functions of mathematical physics 
were invented in the last century as the power series solutions of differential 
equations. 



10.9 Problems 


10.1. Write the first five terms of the expansion of the binomial function 
(10.15) for (a) a = |, (b) a = and (c) a = |. 

10.2. Find the rational number of which each of the following decimal num¬ 
bers is a representation: 

(a) 0.5555.... (b) 0.676767.... (c) 0.123123 .... 

(d) 1.1111.... (e) 2.727272 .... (f) 1.108108 .... 

10.3. Find the interval of convergence of the Maclaurin series for each of the 
familiar functions discussed in Section 10.2. 


10.4. Using the series representation of the familiar functions evaluate the 
following series: 


(a) EZi 
(d) , 


( -l) k x 2k+1 
2k 

(- l) n - 1 x 3n ~ 2 
n3 n 


(b) £r= 0 
(e) 0 


,.2t + l 

W’ 

(-l) n+2 X 3n+1 

3 3 ” +1 (2n)! 


(c) Er= 0 (!w- 

(f\ y-'C’O x m+1 
2-^m—0 (2m+l)! ' 


damping factor 


10.5. Derive Equation (10.17). 
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10 .6. Use the Maclaurin series to find the limits of the following ratios as 
x —> 0 : 

2\/l — x 2 + x 2 — 2 sin# — ln(l + x) 

2 cos x — 2 + x 2 ’ e x — x — cos x 

10 . 7 . (a) Use the Maclaurin series expansion up to x 3 to find the following 
limit: 

2 vU — 6x — 2 cos x + 4 sin x + 7x 2 

lim-— 7 --t---. 

x^o ln(l — x) + e x — 1 

(b) Use the Maclaurin series expansion up to x A to find the following limit: 

e x — ln(l + x 2 ) — cos x + sin x — 2x 

hm- - -. 

x ^° 2v4 + x 2 + cos a; — 5 

10 . 8 . In the special theory of relativity the energy E of a particle of mass m 
and speed v is given by 


Vl-C v/c ) 2 ’ 

where c is the speed of light. Show that for ordinary speeds (v << c), one 
obtains the classical expression for the kinetic energy , defined to be E minus 
the rest energy. 


10 . 9 . The gravitational potential energy for a particle of mass m at a distance 
r from the center of a planet of radius R and mass M is given by 


<F(r) 


GMm 

- + C, 

r 


r > R. 


(a) Find C so that the potential at the surface of the planet is zero. 

(b) Show that at a height h « R above the surface of the planet, the potential 
energy can be written as mgh. Find g in terms of M and R and calculate 
the numerical value of g for the Earth, the Moon, and Jupiter. Look up the 
data you need in a table usually found in introductory physics or astronomy 
books. 


10 . 10 . Prove the hyperbolic identities of Equation (10.21). 

10 . 11 . Show that 

sech 2 x = 1 — tanh 2 x, cosech 2 x = coth 2 x — 1 , 


and 


— tanh x = seclr 2 x, 
ax 


10 . 12 . Derive Equation (10.23). 


U o 

— coth x = — cosech x. 
ax 




10.9 Problems 


313 


10.13. Use L’Hopital’s rule to obtain the following limits: 



(b) lim a 

(c) lim_ . 

(d) lim a 

(e) lim i (ta,nx) cosx . 

x—>T;7r 

(f) lim^. 


10.14. Use L’Hopital’s rule to obtain the limits of Example 10.4.1. 

10.15. Show that the following sequences converge and find their limits: 


In n n 2 


nP 


-r> nln 1 + r > p ( n ) e 


where p is a positive number and P(n) is a polynomial in n. 


10.16. The Yukawa potential of a charge distribution is given by 


$(r) 


IT k e e~ K l r " r 'l dq{ r') 

Jlo \T^\ 


where n is a constant. By expanding r — r' up to the first order in r'/r, show 
that 


<I>(r) 


k e Qe 


k e {nr + l)e 


where p is the dipole moment of the charge distribution. 


10.17. A conic surface has an opening angle of 2a and a lateral length a as 
shown in Figure 10.5. It carries a uniform charge density <r. 

(a) Show that the electrostatic potential at a distance r from the vertex on 
the axis of the cone is 


4>(r) = 2-Kk e cr sin a (^\/r 2 + a 2 — 2ar cos a — r^j 

I a — r cos a + \/r 2 + a 2 — 2ar cos a 


(2nk e a sin a cos a)r In 


r — r cos a 



Figure 10.5: The cone of Problem 10.17. 
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(b) Now suppose that r a, expand the square roots and the log up to the 
second power of the ratio a/r , and show that 

_ a 2 

\/ r 2 + a 2 — 2 ar cos a ss r — a cos a + — sin 2 a 

2 r 


and 


In 


a — r cos a + \Jr 2 + a 2 — 2 ar cos a 


,i | a a ‘ , 

m r — v cos o H- 1 - —-r(l + cos a). 

r 2 r z 


(c) Put (a) and (b) together to show that the potential can be approximated 

by 

^ . . Trk e aa 2 sin a 


Write this expression in terms of the total charge in the cone. Do you get 
what you expect? 

10.18. Recall from your introductory physics courses that the electric field at 
a distance p from a long uniformly charged rod has only a radial component 
which is given by E — A/27reop, where A is the linear charge density. Show 
this result by setting a = —L/2 (why?) and taking the limit of infinite L in 
Equation (4.13). 

10.19. After calculating the potentials of Problems 4.11 and 4.12 for finite 
L, find their limits when L —> oo. 


10 .20. The potential of a certain charge distribution with total charge Q is 
given by 

<I> = — / [In |r — r'| — In b] dq(r'), 

«o J 

where k e , a o, and b are constants. 

(a) Show that for r’ <C r, one can use the approximation 


In |r — r'l « In?- e r 


(b) Use (a) to show that the multipole expansion of d> only up to the dipole 
moment is 


$ 



fee P • r 

a 0 r 2 


10.21. Find the dipole moment of a uniformly charged sphere about its center. 


10 . 22 . A voltage is given by the graph shown in Figure 10.6. 

(a) Write the function V(t) describing the voltage for 0 < t < 2 T. 

(b) If this voltage repeats itself periodically, find the Fourier series expansion 
of V(t). 
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Figure 10.6: The voltage of Problem 10.22. 


10.23. A periodic voltage with period 2 T is given by 


V(t) 


Vo cos( 7 rf/T) 

0 


if - T/2<t< T/2, 
if T/2 < \t\ < T. 


(a) Sketch this function for the interval —3 T < t < 3 T. 

(b) Find ao and ai, the first two cosine coefficients of the Fourier series ex¬ 
pansion of V(t). 

(c) Find a n and all b n , the sine coefficients. 

(d) Write down the Fourier series of V(t). Evaluate both sides at t = 0 to 
show that 





(- 1 )" 
4 n 2 — 1 


This is one of the many series representations of n. 


10.24. An electric voltage V(t) is given by 


V(t) 



0 < t < T 


and repeats itself with period T. 

(a) Sketch V (t) for values of t from t = 0 to t = 3 T. 

(b) Find the Fourier series expansion of V(t). 

10.25. A periodic voltage is given by the formula 


V(t) = 


Vq sin( 7 rf/ 2 T) 

0 


if 0 < t < T, 
if T < t < 2T. 


(a) Sketch the voltage for the interval (—4T, 4T). 

(b) Find the Fourier series representation of this voltage. 


10.26. A periodic voltage with period 4T is given by 
V(t) = 


t 2 

Vo[l-±5 


0 


if T < \t\ < 2 T. 
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(a) Sketch this function for the interval —6 T < t < 6 T. 

(b) Find ao, a n , and b n , the coefficients of the Fourier series expansion of 

v(t). 

(c) Write down the Fourier series of V(t). 

(d) Evaluate both sides at t = T. Do you obtain an identity? If not, what 
sort of relationship is obtained if we demand the equality of both sides? 

10.27. Write out Equation (10.50) up to the second power in the A’s. 

10.28. Find the Taylor series expansion of e x ln(l + y) about (0, 0). 

10.29. (a) Find the multivariable Taylor series expansion of e xy about (0,0). 

(b) Now let 2 = xy, expand the function e z , and substitute xy for 2 in the 
expansion. Show that the results of (a) and (b) agree. 

10.30. Determine all the solutions of the differential equation 


dx 

—~ T 2 tx = 0 
at 


using infinite power series. From the power series solution guess the closed- 
form solution. Now suppose that a;(0) = 1. What is the specific solution with 
this property? 

10.31. Consider the differential equation 


dx „ o 
— + 3 t 2 x = 0. 
dt 


(a) Use a solution of the form an d find a i an d a 2- 

(b) Find a recursion relation relating coefficients. 

(c) From the recursion relation determine the radius of convergence of the 
infinite series. 

(d) Find all coefficients in terms of only one. 

(e) Guess the closed-form solution from the series. Now suppose that a;(0) = 2. 
What is the specific solution with this property? What is the numerical value 
of *(-2)? 




Chapter 11 


Integrals and Series as 
Functions 


The notion of a function as a mathematical entity has a long history as rich as 
the history of mathematics itself. With the invention of the coordinate plane in 
the seventeenth century, functions started to acquire graphical representations 
which, in turn, facilitated the connection between algebra and geometry. It 
was really calculus that triggered an explosion in function theory, and indeed, 
in all mathematics. With calculus came not only the concept of differentiation 
and integration, but also—in the hands of Newton and his contemporaries, 
as they were studying no smaller an object than the universe itself—that 
of differential equations. All these concepts, in particular integration and 
differential equation, had a dramatic influence on the notion of functions. The 
aim of this chapter is to give the reader a flavor of the variety of functions 
made possible by integration and differential equations. 1 


11.1 Integrals as Functions 


Integrals are one of the most convenient media in which new functions can be 
defined. As we saw in Chapter 3, if the integrand or the limits of integration 
include parameters, those parameters can be treated as variables and the 
integral itself as a function of those parameters. In this section, we list some 
of the most important functions that are normally defined in terms of integrals. 


1 We shall not solve any differential equations in this chapter, but simply quote solutions 
to some of them in the form of power series. We shall come back to differential equations 
later in the book. 
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11.1.1 Gamma Function 


equation (11.1) 
defines the 
gamma function 
evaluated at x. 


for integers, the 
gamma function 
becomes a 
factorial. 


Consider the integral 

COO 

r(s)= / dt, (11.1) 

Jo 

where a: is a real number. Integrate Equation (11.1) by parts with u = t x ~ 1 
and dv = e~ l dt to obtain 

F(*) = 


or 

r(a) = (a-l)r(*-l). (11.2) 

In particular, if a; is a positive integer n, then repeated use of Equation (11.2) 
gives 



=r(x-i) 


r(n) = (n - l)r(n - 1) = (n -1 )(n - 2)T(n - 2) 

= (n - l)(n - 2) • • • 1 • T(l) = (n- 1)!, 

where we used the fact that T(l) = 1 as the reader may easily verify using 
Equation (11.1). This equation is written as 

r(n + 1) = n! for positive integer n. (11.3) 


Let us rewrite (11.2) as r(x — 1) = T(x)/(x — 1). Then, 

r(x) 

lim T(a; — 1) = lim-> oo 

x —>1 x —>1 X — 1 


because T(l) 


1. Thus, T(0) = oo. Similarly, 

lim T(x — 1) = lim —^— 
x^O x^o x — 1 


m 

-i 


oo, 


i.e., r(—1) = oo. It is clear that T(n) = oo for any negative integer n or zero. 
It turns out that these are the only points at which T(x) is not defined. 

Definition 11.1.1. The function defined by Equation (11.1) is called the 
gamma function, which, because it satisfies Equation (11.3), is the gener¬ 
alization of the factorials to noninteger values. We sometimes write 


T(x + 1) = a;! for any real x 


(11.4) 


and call T the factorial function. The gamma function is defined for all 
values of its argument except zero and negative integers, for which the gamma 
function becomes infinite. 

2 The most complete analytic discussion of T(z) allows z to be complex and uses the 
full machinery of complex calculus. Here, we shall avoid such completeness and refer the 
reader to Hassani, S. Mathematical Physics: A Modern Introduction to Its Foundations , 
Springer-Verlag, 1999, where a full discussion of T(z) can be found in Section 11.4. 
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It follows from Equation (11.2) that by repeatedly subtracting 1 from the 
argument of the gamma function, we can reduce the evaluation of r(x) to the 
case where x lies between 0 and 1. Such an evaluation can be done numerically 
and the results tabulated. 

Example 11.1.1. In this example, we evaluate T(l). Equation (11.1) gives 

f OO 

r (i)= / t~ 1/2 e~ t dt. 

Jo 

Change the variable of integration to u = yjt with du = (1/2 yji) dt. Then 



where we used the result of Example 3.3.1. 

With r(i) at our disposal, we can evaluate the gamma function at any half¬ 
integer value by the remarks above. For example, 

r(f) = fr(§) = (|)(|)r(|) = (§)(§)(±)r(±) = 

Similarly, with T(l) = — |r(— 1 ), we obtain 

r(-i) = -2T(i) = -20F. a 

It is instructive to generalize the result of the example above and find a 
general formula for the gamma function of any half-integer. Such a formula 
is related to the notion of the double factorial: 

Definition 11.1.2. The double factorial (2n)!! [or (2 n — 1)!!] is defined as 
the product of all even (or odd) integers up to 2 n (or 2n — 1). 

Problem 11.1 gives the detail of the derivation of the following formulas: 

(2n)!! = 2"n! = 2 T T(n+l), (2n - 1)!! = T(n + ±)2 "tt-^ 2 . (11.5) 

An extremely useful approximation to the gamma function is the so-called 
Stirling approximation which is valid for large arguments of the gamma 
function and which we present without derivation 3 

x\ = r(x + 1) « \/2Tte~ x x x+1 ^ 2 . (11.6) 

The Stirling formula works best when x is large. However, even for x = 10, 
it gives v / 27re _10 10 10 ' 5 = 3598696, which is surprisingly close to the exact 
value of 10! = 3628800. For x = 20, the Stirling formula yields 2.42 x 10 18 
to three significant figures as opposed to the calculator result, which to the 
same number of significant figures is 2.43 x 10 18 . For larger and larger values 
of x, the two results get closer and closer. 

3 For a derivation, see Hassani, S. Mathematical Physics: A Modern Introduction to Its 
Foundations, Springer-Verlag, 1999, Chapter 11. 


the values of T(x) 
for 0 < x < 1 
determine T(x) 
for all x. 


double factorial 
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defined 


11.1.2 The Beta Function 

A function that sometimes shows up in applications is the beta function. 
Consider 

/»oo /»oo /»oo coo 

T(x)r(y) = s y - 1 e~ s ds= / dt ds. 

Jo Jo Jo Jo 


Introduce the new variable u = t + s and use it to rewrite the s integral. Since 
the lower limits of both s and t are 0, the lower limit of the u integral will 
also be 0. Similarly, the upper limit of u will be infinity. However, since s 
and t are positive and their sum is u, the upper limit of t cannot exceed u. 
Therefore, 

cOO nU 

r(x)r(y) = / du dt t x ~ l (u — t) v ~ 1 e~ u . 

Jo Jo 

Now introduce another variable w by t = uw. Since in the t integration, u is 
held constant, we have dt = udw, and the limits of integration for w are 0 
and 1. This will allow us to write 


cOO cl 

T(a;)r(y) = / due~ u u x+v ~ 1 / GhuwA -1 (l — w) v ~ l . 
Jo Jo 


=T{x+y) 


The last integral defines the beta function. So, 

B{x,y) = l { * )r . {y ] = f dte-\i-t)y-\ (11.7) 

r(z + y) Jo 

where we changed the (dummy) variable of integration from w to t. 

We can find another representation of the beta function by substituting 
t = sin 2 9. Then 


dt = 2 sin 9 cos 6, 1 — t = 1 — sin 2 9 = cos 2 ( 

and the limits of integration become 0 and tt/2. So, 

t/2 


B{x, y)= 2 (sin (9 ) 2:e_1 (cos 0) 2y_1 d6i. 

Jo 


( 11 . 8 ) 


Integration and differentiation and the whole machinery of calculus opened up en¬ 
tirely new ways of defining functions. Of these, one of the most important is the 
gamma function, which arose from work on two problems, interpolation theory and 
antidifferentiation. The problem of interpolation had been considered by James Stir¬ 
ling (1692-1770), Daniel Bernoulli (1700-1782), and Christian Goldbach. It was posed 
to Euler and he announced his solution in a letter of October 13, 1729, to Goldbach. 
A second letter, of January 8, 1730, brought in the integration problem. 
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The interpolation problem had to do with giving meaning to n! for nonintegral 
values of n, and the integration problem was the evaluation of an integral already 
considered by Wallis, namely 

[ t x (l-t) y dt. 

Jo 


Euler showed that this integral led to our integral (11.1). 

Leonhard Euler was Switzerland’s foremost scientist and one of the three 
greatest mathematicians of modern times (Gauss and Riemann being the other two). 
He was perhaps the most prolific author of all time in any field. From 1727 to 1783 
his writings poured out in a seemingly endless flood, constantly adding knowledge 
to every known branch of pure and applied mathematics, and also to many that 
were not known until he created them. He averaged about 800 printed pages a 
year throughout his long life, and yet he almost always had something worthwhile 
to say. The publication of his complete works was started in 1911, and the end 
is not in sight. This edition was planned to include 887 titles in 72 volumes, but 
since that time extensive new deposits of previously unknown manuscripts have been 
unearthed, and it is now estimated that more than 100 large volumes will be required 
for completion of the project. Euler evidently wrote mathematics with the ease and 
fluency of a skilled speaker discoursing on subjects with which he is intimately 
familiar. His writings are models of relaxed clarity. He never condensed, and he 
reveled in the rich abundance of his ideas and the vast scope of his interests. The 
French physicist Arago, in speaking of Euler’s incomparable mathematical facility, 
remarked that “He calculated without apparent effort, as men breathe, or as eagles 
sustain themselves in the wind.” He suffered total blindness during the last 17 years 
of his life, but with the aid of his powerful memory and fertile imagination, and 
with assistants to write his books and scientific papers from dictation, he actually 
increased his already prodigious output of work. 

Euler was a native of Basel and a student of Johann Bernoulli at the University, 
but he soon outstripped his teacher. He was also a man of broad culture, well 
versed in the classical languages and literatures (he knew the Aeneid by heart), 
many modern languages, physiology, medicine, botany, geography, and the entire 
body of physical science as it was known in his time. His personal life was as placid 
and uneventful as is possible for a man with 13 children. 

Though he was not himself a teacher, Euler has had a deeper influence on the 
teaching of mathematics than any other person. This came about chiefly through 
his three great treatises: Introductio in Analysin Infinitorum, (1748); Institutiones 
Calculi Differentialis (1755); and Institutiones Calculi Integralis (1768-1794). There 
is considerable truth in the old saying that all elementary and advanced calculus 
textbooks since 1748 are essentially copies of Euler or copies of copies of Euler. 
These works summed up and codified the discoveries of his predecessors, and are 
full of Euler’s own ideas. He extended and perfected plane and solid analytic geom¬ 
etry, introduced the analytic approach to trigonometry, and was responsible for the 
modern treatment of the functions In a; and e x . He created a consistent theory of 
logarithms of negative and imaginary numbers, and discovered that In x has an infi¬ 
nite number of values. It was through his work that the symbols e, 7r, and i = y/—l 
became common currency for all mathematicians, and it was he who linked them 
together in the astonishing relation e xn = —1. Among his other contributions to 
standard mathematical notation were sin®, cos®, the use of f(x) for an unspecified 
function, and the use of for summation. 



Leonhard Euler 
1707-1783 
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His work in all departments of analysis strongly influenced the further develop¬ 
ment of this subject through the next two centuries. He contributed many important 
ideas to differential equations, including substantial parts of the theory of second- 
order linear equations and the method of solution by power series. He gave the first 
systematic discussion of the calculus of variations, which he founded on his basic 
differential equation for a minimizing curve. He discovered the integral defining the 
gamma function and developed many of its applications and special properties. He 
also worked with Fourier series, encountered the Bessel functions in his study of the 
vibrations of a stretched circular membrane, and applied Laplace transforms to solve 
differential equations—all before Fourier, Bessel, and Laplace were born. 

E. T. Bell, the well-known historian of mathematics, observed that “One of the 
most remarkable features of Euler’s universal genius was its equal strength in both 
of the main currents of mathematics, the continuous and the discrete.” In the realm 
of the discrete, he was one of the originators of number theory and made many far- 
reaching contributions to this subject throughout his life. In addition, the origins 
of topology—one of the dominant forces in modern mathematics—lie in his solution 
of the Konigsberg bridge problem and his formula V — E + F = 2 connecting the 
numbers of vertices, edges, and faces of a simple polyhedron. 

The distinction between pure and applied mathematics did not exist in Euler’s 
day, and for him the entire physical universe was a convenient object whose diverse 
phenomena offered scope for his methods of analysis. The foundations of classical 
mechanics had been laid down by Newton, but Euler was the principal architect. In 
his treatise of 1736 he was the first to explicitly introduce the concept of a mass- 
point, or particle, and he was also the first to study the acceleration of a particle 
moving along any curve and to use the notion of a vector in connection with velocity 
and acceleration. His continued successes in mathematical physics were so numerous, 
and his influence was so pervasive, that most of his discoveries are not credited to 
him at all and are taken for granted in the physics community as part of the natural 
order of things. However, we do have Euler’s angles for the rotation of a rigid body, 
and the all-important Euler-Lagrange equation of variational dynamics. 


11.1.3 The Error Function 

The error function, used extensively in statistics, is defined as 

1 f X 2 2 f X 2 

erf (a;) = —j= / e~* dt = —= / e~* dt (11.9) 

V 7r J-x V 7r Jo 

and has the property that erf(oo) = 1. The error function erf(x) gives the 
area under the bell-shaped (normal) probability distribution located between 
—x and +x. 

11.1.4 Elliptic Functions 

Recall from calculus 4 that the element of length of a curve parameterized by 
x = f(t), y = g(t), z = h(t), t 1 <t<t 2 , 

4 Or from our discussion of the parametric equation of curves in Chapter 4. 
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in Cartesian coordinates is 


dl = \Jdx 2 + dy 2 + dz 2 = \/[f'(t)] 2 + \g'(t)] 2 + [h'(t )] 2 dt, 


where prime indicates the derivative. So, the length L of the curve connecting 
the initial point (/(ti), g{t\), h(ti)) to the final point (f(tz), gfo), is 

L = / ‘ 2 VwW+WW+WWdt. (11.10) 

Jti 

The length of many curves, some very complicated-looking, can be found 
analytically using Equation (11.10). However, that of a simple curve such 
as an ellipse turns out to be impossible! Let us see what we get when we 
try to calculate the circumference of an ellipse. The parametric equation of 
an ellipse of respective semi-major and semi-minor axes a and b lying in the 
rry-plane is conveniently written as 


x = asint, y = boost, z = 0, 0 < t < 2 tt. (11.11) 


Substitution of these equations in (11.10) yields 


p2ir p2n 

L = / y/[a cost ] 2 + [— bsint] 2 + [0] 2 dt= \J a 2 cos 2 t + b 2 sin 2 t dt 

Jo Jo 

I > 2tt /- p27T _ 

= / y a 2 (l — sin 2 1) + b 2 sin 2 t dt = a / Vl — k 2 sin 2 t dt, (11.12) 

Jo Jo 


where kr = (a 2 — b 2 )/a 2 . This innocent-looking integral does not succumb 
to any technique of integration. It was this resistance to analytical solution 
that prompted the nineteenth century mathematicians to study this and other 
related integrals as functions in their own right. 

The elliptic integral of the first kind is defined as 


F(<p, k) 


dt 


’0 \/1 — k 2 sin 2 t 


(11.13) 


with F a function of two variables because the integral involves two parame¬ 
ters, one appearing in the integrand and the other appearing as a limit. 

The elliptic integral of the second kind is defined as 

E(ip,k)= f \/l — k 2 sin 2 t dt. (11.14) 

Jo 

The elliptic integral of the second kind can be interpreted as the length of 
partial arcs of an ellipse. The circumference L of an ellipse with respective 
semi-major and semi-minor axes a and b is simply 


L = aE(2n, k) 


Va 2 -b 2 

where k =- 

a 


there is no formula 
in closed form for 
the circumference 
of an ellipse! 


elliptic integral of 
the first kind 


elliptic integral of 
the second kind 
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complete elliptic 
integrals 


It is common to define the complete elliptic integral of the first and 
second kinds: 


K ^ F (i’ k )=fS 


dt 

y/l — k 2 sin 2 1 
y/l — fc 2 sin 2 t dt. 


(11.15) 


The reader may easily verify (Problem 11.10) that the total circumference of 
an ellipse can be given in terms of complete elliptic integrals. 

The parameterization given in Equation (11.11) is that of a horizontal 
ellipse (a > b). However, one may wish to start with a vertical ellipse (a < b). 
Then, as the reader may verify, one ends up with an integral similar to (11.14), 
except that the coefficient of sin 2 f is +k 2 . Would this be a new elliptic 
integral? Problem 11.9 shows that the new integral can be written as a sum 
of the existing elliptic integrals. 

large-angle 
pendulum and 
elliptic integrals 

E = KE + PE = \m{W) 2 + mgh = \ml 2 9 2 + mg(l - l cos 9), 


Example 11.1.2. Elliptic integrals show up in areas of physics totally unrelated 
to the circumference of an ellipse. Consider a pendulum of mass m and length l 
displaced by an angle 9 from its equilibrium position as shown in Figure 11.1. When 
the angle is 9, the velocity of the pendulum is 19 and its height is h. Conservation 
of energy leads to 


where E is the total mechanical energy of the pendulum. If 9 m is the maximum 
angular displacement, then the total energy at this angle will be just the potential 
energy. 5 It then follows that 

^ m(l9 ) 2 + mgh = 5 ml 2 9 2 + mg(l — l cos 9) = mg{l — l cos 9m), 


or, after dividing both sides by ml, 

^19 2 — geos 9 = — geos 9 m - 


(11.16) 



Figure 11.1: The pendulum displaced by an arbitrary angle 9. 


5 The KE is zero at 0 rn , because the pendulum comes to a momentary stop there. 
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The elementary treatment of the pendulum problem differentiates Equation 
(11.16) with respect to time, assumes that the maximum angle—and therefore any 
angle—is small, and approximates sin 9 with 9 in radians. This leads to 

l 2 99 + gW sin6» = 0 or #+|sin# = 0 9 + j9 = 0, 

which is the equation of a simple harmonic oscillator 6 with ui 2 = g/l or T = Jl/g. 
This is the famous result—known even to Galileo —that, for small angles, the period 
of oscillation is independent of the angle. 

A more advanced treatment makes no approximation for the angle and simply 
integrates (11.16). Assuming that 9 > 0, Equation (11.16) gives 

= 2 \/f\/ Sin2 (^ i )' Sin2 (0’ (,L17) 


- V COS 9 — COS Or, 


where we used the trigonometric identity cos# = 1 — 2sin 2 (#/2). Introducing a new 
variable s given by 



differentiating this equation with respect to f, and using Equation (11.17) yields 


ds 

dt 



(11.18) 


This leads to 



ds 

\f\ — sin 2 (# m /2) sin 2 s 


which can be integrated to yield 


t = 


i r 

9 Jo 


du 


y/l — sin 2 (# m /2) sin 2 u 



(s(#),sin^-j , 


(11.19) 


where s = sin _1 [sin(#/2)/sin(# m /2)], and we have assumed that at t = 0, the angle 
9 is zero and therefore s = 0 as well. 

Of particular interest is the period of the oscillation which is four times the time 
it takes the pendulum to go from 9 = 0 to 9 = # m . These values correspond to s = 0 
and s = 7 t/ 2. It follows that 


T = 



yr 


du 

sin 2 (# m /2) sin 2 u 


period of a 
pendulum depends 
on the amplitude 
of oscillation. 



( 11 . 20 ) 


®Recall that the equation of a simple harmonic oscillator (SHO)—such as a spring—mass 
system with mass m and spring constant k —is mx + kx = 0 or x 4- (k/m)x = 0. It is shown 
in elementary physics that the angular frequency of this SHO is uj = ^Jk/m. Thus, in 
any SHO equation in which the second derivative appears with no coefficient, the coefficient 
of the undifferentiated quantity is the square of the angular frequency. 
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Niels Henrik Abel 
1802-1829 


This shows clearly that for large maximum angles, the period does depend on the 
amplitude. By expanding the integrand in a power series as developed in Chapter 
10, one can obtain the deviation from constant period as powers of sin 2 (# m /2). We 
quote the result of such an expansion 


T = 2tt 



,. 1.2 

1+ 4 Sm 




( 11 . 21 ) 


The reader is urged to verify this result (see Problems 11.11 and 11.12). | 


The study of elliptical integrals can be said to have started in 1655 when Wallis 
began to study the arc length of an ellipse. In fact he considered the arc lengths of 
various cycloids and related these arc lengths to that of the ellipse. Both Wallis and 
Newton published an infinite series expansion for the arc length of the ellipse. 

In 1679 Jacob Bernoulli attempted to find the arc length of a spiral and encoun¬ 
tered an example of an elliptic integral. He made an important step in the theory 
of elliptic integrals in 1694. He examined the shape that an elastic rod will take if 
compressed at the ends. He showed that the curve could be expressed in terms of 
an integral, which was very similar to the one obtained by Wallis. 

There is no doubt that Gauss obtained a number of key results in the theory 
of elliptic functions, because many of these were found after his death in papers he 
had never published. However, the acknowledged founders of the theory of elliptic 
functions were Abel and Jacobi. 

Niels Henrik Abel was the son of a poor pastor. As a student in Christiania 
(Oslo), Norway, he had the luck to have Berndt Holmboe (1795-1850) as a teacher. 
Holmboe recognized Abel’s genius and predicted when Abel was seventeen that he 
would become the greatest mathematician in the world. After studying at Christia¬ 
nia and at Copenhagen, Abel received a scholarship that permitted him to travel. 
In Paris, he was presented to Legendre, Laplace, and Cauchy, but they ignored him. 
Having exhausted his funds, he departed for Berlin and spent the years 1825-1827 
with Crelle. 

He returned to Christiania so exhausted that he found it necessary, he wrote, to 
hold on to the gates of a church. To earn money he gave lessons to young students. 
He began to receive attention through his published works, and Crelle thought he 
might be able to secure him a professorship at the University of Berlin. But Abel 
became ill with tuberculosis and died in 1829 when he was only twenty-seven years 
old. 

Abel knew of the work of Euler, Lagrange, and Legendre on elliptic integrals, and 
may have gotten ideas for his own work from the work of Gauss. Abel started to 
write papers in 1825. He presented his major paper to the Academy of Sciences in 
Paris in 1826. The paper was given to Cauchy to review it. But partly because of the 
length and the difficulty of the paper and partly to favor his own work, Cauchy laid 
it aside. After Abel’s death, when his fame was established, the academy searched 
for the paper, found it, and published it in 1841. 

The other discoverer of elliptic functions was Carl Gustav Jacob Jacobi. 
Unlike Abel, he lived a quiet life. Born in Potsdam to a Jewish family, he studied at 
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the University of Berlin and in 1827 became a professor at Konigsberg. In 1842 he 
had to give up his post because of ill health. He was given a pension by the Prussian 
government and retired to Berlin, where he died in 1851. His fame was great even 
during his lifetime, and his students spread his ideas to many centers. 

Jacobi taught the subject of elliptic functions for many years. His approach be¬ 
came the model according to which the theory of functions itself was developed. He 
also worked in functional determinants (Jacobians), ordinary and partial differential 
equations, dynamics, celestial mechanics, and fluid dynamics. 

Jacobi’s work on elliptic functions started in 1827 when he submitted a paper for 
publication without proof. Almost simultaneously, Abel wrote his research paper on 
elliptic functions. Both had arrived at the key idea of working with inverse functions 
of the elliptic integrals, an idea that Abel had had since 1823. Thereafter, they both 
published on the subject. But whereas Abel died in 1829, Jacobi lived to publish 
much more. In particular, his Fundamenta Nova Theoriae Functionum Ellipticarum 
of 1829 became a leading work on the subject. 


11.2 Power Series as Functions 

Differential equations have found their way into all areas of physics from the 
motion of planets around the Sun to standing waves on a rope or a drum, 
to electrical properties of conductors, and the behavior of electromagnetic 
fields and beyond. As is always the case, no mathematics can draw more 
attention than that which deals directly with Nature. The urgency of finding 
solutions to these differential equations prompted many mathematicians of the 
latter part of the eighteenth and the beginning of the nineteenth centuries to 
concentrate heavily on certain specific differential equations. It appeared that 
every differential equation dictated by Nature gave rise to a new function. The 
most common scheme for solving these differential equations was to assume 
a power series solution, substitute the assumed solution in the differential 
equation, and determine the (unknown) coefficients from the resulting equality 
of power series. We shall come back to this powerful method in Chapters 24 
and 25 through 27. At this point, we want to simply give examples of solutions 
(functions) of certain differential equations that were discovered in the form 
of a power series. 

Chapter 10 showed how known functions (such as trigonometric and log¬ 
arithmic functions) can be represented, as power series. These functions had 
been known prior to the popularity of infinite series, and the origin of their 
discovery lay in areas of mathematics outside calculus. One does not need a 
power series to calculate sin(35°); an appropriate right triangle and careful 
measurement of its sides and hypotenuse will do the job. The functions we are 
discussing here are defined in terms of power series and do not have indepen¬ 
dent existence. With some mathematical manipulation they may be written 
as a definite integral —which cannot be evaluated analytically. But that is 
just as abstract as an infinite series because in the latter case, the integrals 
become their definition. 



Carl Gustav Jacobi 
1804-1851 
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11.2.1 Hypergeometric Functions 

In their studies of second-order differential equations (DE), mathematicians, 
always in search of generalities, came up with the most general form of a 
second-order linear DE which appeared to encompass all known DEs of phys¬ 
ical interest. This DE, called the hypergeometric differential equation, 
turned out to be 7 

x(l — x)y" + [7 — (a + (3 + 1 )x]y' — af3y = 0, (11.22) 

where a, (3, and 7 are constants . 8 The series solution of this DE, called 
the hypergeometric function can be written in terms of the gamma func¬ 
tion as 9 


F(a,/3; 7 ; a:) 


r ( 7 ) T(a + n)T(/3 + n) n 

r(q)r(/3)^ 0 r(7 + n)r(n + l) 


(11.23) 


From this series representation, we immediately note that the hyperge¬ 
ometric function is symmetric under interchange of a and (3. Furthermore, 
if either a or j3 is a negative integer, say — m, then the denominator of the 
constant outside becomes infinite by Definition 11.1.1. However, the gamma 
function in the numerator of the first m terms of the sum will also be infinite. 
The cancellation of these infinities [see Problem 11.4(c)] gives a nonzero sum 
up to to, but the rest of the series will be zero. Therefore, 


Box 11.2.1. The hypergeometric function is symmetric under interchange 
of a and (3: F(a, (3\ 7 ; x) = F(/3, a; 7 ; x). Furthermore, F{— to, f3\ 7 ; x) 
[and therefore F(a, —to; 7 ; a:)] is a polynomial if m is a positive integer. 


As mentioned before, many a time, the infinite series can be “integrated” 
and the resulting function written in terms of an integral. In this case, we 
start by multiplying and dividing the series of Equation (11.23) by r(q — f3) 
to obtain 


F(a,(3] 7 ; x) 


r( 7 ) Y' r(q + ?r) r( 7 -/ 3 )T(/? + n) 
r(q)r(/3)r(7-/3)^ o r(n+l) r(7 + re) 

=.B( 7 -/ 3 ,/3+n) by (11.7) 


7 For a comprehensive treatment of this differential equation, see Hassani, S. Mathemati¬ 
cal Physics: A Modern Introduction to Its Foundations , Springer-Verlag, 1999, Chapter 14. 

8 Some authors use a, 6 , and c instead of a, /3, and 7 . 

9 Some authors use 2 Pi instead of F. Our use of F to represent both the elliptic integral 
of the first kind and the hypergeometric function should not cause any confusion because 
the two functions have different numbers of arguments (independent variables). 
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Now use r(n + 1) = n! and the integral representation of the beta function to 
get 


f 0 ./ 3 ; 7 ; x) 


r(7) 

r(o)r(/3)r(7-/3) 


Y / dt(l-t) 7 " /3 " 1 t /3+ ”“ 1 r(a+n) — 

n= 0 -'° n ' 


r(7) 

r(/?)r( 7 - /3) 



di(l - 

n —0 


T(a + n) ( tx) n 
T(a) n! 


Using the result of Problem 11.4, we can now write 

F{a,0\r,x) = ^ J q dtil-ty-P-H^il-tx)- 01 . (11.24) 

This is the integral representation of the hypergeometric function. 

The generality of the hypergeometric DE results in the ability to express 
many functions— both elementary and the so-called special functions of math¬ 
ematical physics—in terms of the hypergeometric function. For example, con¬ 
sider the complete elliptic integral of the second kind E(k). The two factors of 
double factorials in both the numerator and denominator of its series expan¬ 
sion (see Problem 11.13), together with Equation (11.5) and the hypergeomet¬ 
ric series (11.23), hint at the possibility of writing E{k) as a hypergeometric 
function. This is indeed the case. Substituting (11.5) in the expansion of 
E(k) as given in Problem 11.13 yields 


E(k) 


7T j ^ T(n + i)r(?r + i)7r 1 k 2n ) 
2\ hy r(n+l)P(n+l) 2(n-±)J 

Z _ I V r ( n +^) r ( n ~ 

2 4^r(n+l)r(n+l) 1 


where we used T(n + |) = (n — i)T(?z — ^). The sum starts with n = 1. To 
make it look like a hyper geometric series, we need to include the zero term as 
well. Adding and subtracting this term gives 


E(k) 


Z _ I V r ( n +l) r ( n ~i) /,2xn , I [ r (|) r (—|) /; 2\0 

2 4^r(n + l)r(n+l) 1 j + 4 [ T(1)T(1) 1 

1 ” r(n+l)r(n-l) 

4^ n r(n + l)T(n + 1) 1 ’ 

n=0 


because T(—^) = —2T(^) = —2y / 7t by Example 11.1.1. We now note that 
except for a multiplicative constant, the sum is that of the hypergeometric 
function with a = ^ = —/? and 7 = 1. Inserting the multiplicative constant 


r(i) _ i 

r (|)T(-|) (-27r) 


integral 

representation of 
the 

hypergeometric 

function 
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we obtain 

E(k) = jF(i,-i;l;fc 2 ). 
The reader may verify that 


K(k) 


7T 

4 


F ( 


1 1 . 

2 ’ 2 ’ 


i ;* 2 ). 


(11.25) 


(11.26) 


Historical Notes 



Carl Friedrich 
Gauss 1777-1855 


Johann Carl Friedrich Gauss was the greatest of all mathematicians and perhaps 
the most richly gifted genius of whom there is any record. He was born in the city of 
Brunswick in northern Germany. His exceptional skill with numbers was clear at a 
very early age, and in later life he joked that he knew how to count before he could 
talk. It is said that Goethe wrote and directed little plays for a puppet theater when 
he was six and that Mozart composed his first childish minuets when he was five, 
but Gauss corrected an error in his father’s payroll accounts at the age of three. At 
the age of seven, when he started elementary school, his teacher was amazed when 
Gauss summed the integers from 1 to 100 instantly by spotting that the sum was 
50 pairs of numbers each pair summing to 101. 

His long professional life is so filled with accomplishments that it is impossible 
to give a full account of them in the short space available here. All we can do is 
simply give a chronology of his almost uncountable discoveries. 

1792—1794: Gauss reads the works of Newton, Euler, and Lagrange; discovers the 
prime number theorem (at the age of 14 or 15); invents the method of least squares; 
conceives the Gaussian law of distribution in the theory of probability. 

1795: (only 18 years old!) Proves that a regular polygon with n sides is constructible 
(by ruler and compass) if and only if n is the product of a power of 2 and distinct 
prime numbers of the form pk = 2 2 +1, and completely solves the 2000-year old 
problem of ruler-and-compass construction of regular polygons. He also discovers 
the law of quadratic reciprocity. 

1799: Proves the fundamental theorem of algebra in his doctoral dissertation 
using the then-mysterious complex numbers with complete confidence. 

1801: Gauss publishes his Disquisitiones Arithmeticae in which he creates the mod¬ 
ern rigorous approach to mathematics; predicts the exact location of the asteroid 
Ceres. 

1807: Becomes professor of astronomy and the director of the new observatory at 
Gottingen. 

1809: Publishes his second book, Theoria motus corporum coelestium, a major 
two-volume treatise on the motion of celestial bodies and the bible of planetary as¬ 
tronomers for the next 100 years. 

1812: Publishes Disquisitiones generates circa seriem infinitam, a rigorous treat¬ 
ment of infinite series, and introduces the hypergeometric function for the first 
time, for which he uses the notation F(a, /3; 7; z); an essay on approximate integra¬ 
tion. 

1820—1830: Publishes over 70 papers, including Disquisitiones generates circa su¬ 
perficies curvas, in which he creates the intrinsic differential geometry of general 
curved surfaces, the forerunner of Riemannian geometry and the general theory of 
relativity. 
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From the 1830s on, Gauss was increasingly occupied with physics, and he en¬ 
riched every branch of the subject he touched. In the theory of surface tension, 
he developed the fundamental idea of conservation of energy and solved the earli¬ 
est problem in the calculus of variations. In optics, he introduced the concept 
of the focal length of a system of lenses. He virtually created the science of geo¬ 
magnetism, and in collaboration with his friend and colleague Wilhelm Weber he 
invented the electromagnetic telegraph. In 1839 Gauss published his fundamental 
paper on the general theory of inverse square forces, which established potential 
theory as a coherent branch of mathematics and in which he established the di¬ 
vergence theorem. 

Gauss had many opportunities to leave Gottingen, but he refused all offers and 
remained there for the rest of his life, living quietly and simply, traveling rarely, and 
working with immense energy on a wide variety of problems in mathematics and 
its applications. Apart from science and his family—he married twice and had six 
children, two of whom emigrated to America—his main interests were history and 
world literature, international politics, and public finance. He owned a large library 
of about 6000 volumes in many languages, including Greek, Latin, English, French, 
Russian, Danish, and of course German. His acuteness in handling his own financial 
affairs is shown by the fact that although he started with virtually nothing, he left 
an estate over a hundred times as great as his average annual income during the last 
half of his life. 

The foregoing list is the published portion of Gauss’s total achievement; the un¬ 
published and private part is almost equally impressive. His scientific diary, a little 
booklet of 19 pages, discovered in 1898, extends from 1796 to 1814 and consists of 146 
very concise statements of the results of his investigations, which often occupied him 
for weeks or months. These ideas were so abundant and so frequent that he physi¬ 
cally did not have time to publish them. Some of the ideas recorded in this diary: 
Cauchy Integral Formula: Gauss discovers it in 1811, 16 years before Cauchy. 
Non-Euclidean Geometry: After failing to prove Euclid’s fifth postulate at the 
age of 15, Gauss came to the conclusion that the Euclidean form of geometry cannot 
be the only one possible. 

Elliptic Functions: Gauss had found many of the results of Abel and Jacobi (the 
two main contributors to the subject) before these men were born. The facts became 
known partly through Jacobi himself. His attention was caught by a cryptic passage 
in the Disquisitiones, whose meaning can only be understood if one knows some¬ 
thing about elliptic functions. He visited Gauss on several occasions to verify his 
suspicions and tell him about his own most recent discoveries, and each time Gauss 
pulled 30-year-old manuscripts out of his desk and showed Jacobi what Jacobi had 
just shown him. After a week’s visit with Gauss in 1840, Jacobi wrote to his brother, 
“Mathematics would be in a very different position if practical astronomy had not 
diverted this colossal genius from his glorious career.” 

A possible explanation for not publishing such important ideas is suggested by 
his comments in a letter to Bolyai: “It is not knowledge but the act of learning, not 
possession but the act of getting there, which grants the greatest enjoyment. When 
I have clarified and exhausted a subject, then I turn away from it in order to go into 
darkness again.” His was the temperament of an explorer who is reluctant to take the 
time to write an account of his last expedition when he could be starting another. As 
it was, Gauss wrote a great deal, but to have published every fundamental discovery 
he made in a form satisfactory to himself would have required several long lifetimes. 
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11.2.2 Confluent Hypergeometric Functions 

The parameters a, f3, and 7 determine the behavior of the hypergeometric 
function completely. A great number of differential equations in mathematical 
physics correspond to the case where only two parameters are involved. The 
most effective way of accommodating this arises from the confluence (3 —> 00. 
Let us see how this works. 

Substitute x = u/j3 in the hypergeometric DE using the—very simple— 
chain rule to transform the ^-derivatives to the it-derivatives. This leads to 
the DE 


d*y 
P) r du 2 


1 - 5 1 P 2 


7 — (a + (3 + 1) 


P 


P^y- - af3y = 0. 
du 


Dividing the entire equation by /?, taking the limit P —» 00—thus neglecting 

u/(3 and 1/P —yields the so-called confluent hypergeometric differential 
equation: 

xy" + (7 - x)y' -ay = 0, (11.27) 

where we restored x as the independent variable. 

The infinite series solution of this DE is called the confluent hypergeo¬ 
metric function. This solution, as well as its integral representation, can be 
obtained by taking the appropriate limit of the corresponding expression for 
the hypergeometric function. The limit of Equation (11.23) yields 


$(a; 7; x) = ^lim F(a, ft 7; x/ft) = ^ (1L28) 


r ( a )^o r ^ + n ) r ( n+1 )‘ 


where we used 

T(/3 + n) (/3 + n — l)(/3 + n — 2) • • • /3T(/5) 


/3"T(/3) 


Similarly, we have 


P n np) 

(3 + n— 1\ (p + n— 2 


P 


P 


P\ 0- 
P 


1 . 


<h(a;7;a:) = lim F(p,ar,r,x/P) 

(3 —>00 


= lim 


r(7) 


/3—>oo T(a)T(7 — a) 


jfta -o’-- (i-f) 


-0 


*e tx (Prob. 11.3) 


integral 

representation of 
the confluent 
hypergeometric 
function 


where we have used the symmetry of the hypergeometric function under inter¬ 
change of its first two parameters. It follows that the integral representation 
of the confluent hypergeometric function is 


<L(a:; 7; x) 


r(7) 

T(a)T(7 — a) 


dt(l - i)'r-“- 1 t“- 1 e to . 


(11.29) 


0 
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We note that 


$(a; a; x) = ^ . . x n = ^ — = 

^ o r (n+!) ^n! 


and Problem 11.20 shows that 


2x 

~R 


erf (a) = ^7=^(1; I; -X 2 ). 


Many other functions encountered in mathematical physics can also be ex¬ 
pressed in terms of confluent hypergeometric functions, and we shall point 
this out as we come across these functions in the sequel. We note in passing 
that, as in the case of lrypergeometric function, 


Box 11.2.2. If a happens to he a negative integer, then < E > (a; 7; x) becomes 
a polynomial, i.e., the infinite series truncates. 


11.2.3 Bessel Functions 


Bessel functions are arguably among the most utilized functions of mathe¬ 
matical physics. We shall come back to them when we consider solutions of 
Laplace’s equation in cylindrical coordinates and discover their connection 
with other functions treated in this chapter. At this point, we simply intro¬ 
duce them as power series. The Bessel function J v (x) of order v is a solution 
of the Bessel differential equation: 


d 2 y dy 
x o T , 
dx dx 



(11.30) 


Bessel differential 
equation 


Chapter 27 shows how to obtain the power series expansion of J„(x): 


«*>-(! )’ts 

k =0 


(-i) A 


k\T(v + k + 1) 



(11.31) 


The point to emphasize is that 


Box 11.2.3. Bessel functions are always given in terms of their expan¬ 
sion in power series (or as an integral involving parameters). It is gener¬ 
ally impossible to reduce Bessel functions to any functional combination 
of more elementary functions such as polynomials, or trigonometric and 
exponential functions. 





334 


Integrals and Series as Functions 


Properties and applications of Bessel functions are treated in some detail in 
Chapter 27. 10 However, some relations are elementary enough to be included 
here, as they also illustrate the use of summation symbols. First note that if 
v is an integer —m, then 


\2J f^kW(-m+k+l) 

= v (~ x ) fc 

V2/ ^ k\T(-m + k + 1) 

k—m 



because the first m terms of the first series have gamma functions in the 
denominator with negative integer (or zero) arguments. Now in the second 
series, replace k by n = k — m. This yields 


/ T \-m ” m+n /rN 2m+2ra 


(11.32) 


n—0 


(-ir(f)”E 


(-ir 


n =0 


r(m + n + l)n 


/ T \ 

j (2) = (“I ) m Jm{x), 


where we used T(j + 1) = j! for positive integer j. 

Example 11.2.1. Bessel functions of half-integer order are related to trigonomet¬ 
ric functions. To see this, note that 


J L/2 — 


py/2” (-1)* 

^2/ k\F(k + f) 



2k 


= (-\ V2 V 2fc +1 

V2/ fr'o fe!r (fe + f)2 2fc +! 


Now substitute for T(k+ §) in terms of factorials as given in Problem 11.1 to obtain 




( — 1) 2fc+l 
--- —X 


V^^o( 2fe + 1 ) ! 


1/2 


Similarly, 


as the reader may verify. 


J- 1/2 



10 See also Hassani, S. Mathematical Physics: A Modem Introduction to Its Foundations , 
Springer-Verlag, 1999, Section 14.5. 
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Another formula of interest is a recursion relation connecting Bessel func¬ 
tions of different integer orders. Write J m -i{x) as 


'-<*>-(i r't* 


(-i) A 


k—0 


k\T(m + k) 


(i) 


2k 


(11.33) 


(!)' 
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OO 

+ V 

(-l) fe (■ 

x\ 2fc 


_r(m) ^ 

fc!r(m + k) V : 

2 / 


where we separated the k = 0 term from the rest of the sum. Similarly, write 
Jm+1 (*^) 


J, 


-w-drsii 

k =0 

1 1 00 

(!) E 


(-1)* 


(!) 


2k 


k\T(m + k + 2 ) 

(-ip - 1 

(•? ~ l)!r(m + j + l) 


x\ 2 i- 2 


(!) 


(11.34) 


OO 

(!) E 


<-!)* 


/c=l 


(fc-l)!r(m + fc + l) \2 7 


(! 


2 fc 


where in the second line, we substituted j = k + 1 for fc, and in the last line, 
we used (— 1) _1 = — 1 , factored ( x/2)~ 2 out of the summation, and changed 
the dummy index back to k. Now add Equations (11.33) and (11.34) and use 

11 in 

k\T(m + k) {k — l)!r(?n + k + 1) k\T(m + k + 1) ’ 
and l/r(m) = m/T{m + 1) to obtain 


A Jm +1 (**0 — (2) 


m— 1 


m 


£ 


(—l) k m 


r(m+l) k\T(m + k + 1 ) 


(!) 




m (!r‘ (!)"e s 


(-1) A 


/c —0 


/c!r(m + k + 1) 


(!) 


2fc 


= Jm{x) 

or, finally, 

2777 - 

Jm— 1 (**0 4” 'An+l(^') = Jm{x)’ (11.35) 

X 

The straightforward details are left as Problem 11.22. One can also show that 
Jm-i{x) — J m+ i{x) = 2J' m {x), (11.36) 


where prime indicates differentiation. Equations (11.35) and (11.36) lead to 

777 

Jm— 1 (*^) — Jmi%) + Jm^P^)'> 

X 

771 

Jm+l{x) = —^m(^) J'rri («^) • (11.37) 

These plus the results of Example 11.2.1 give all Bessel functions of half¬ 
integer order. 
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11.1. (a) Show that (see Definition 11.1.2 for the definition of the following 
notation): 

( 2 „\\ 

( 2 n)!! = 2 "n! and ( 2 n - 1 )!! = ++ 
y y ’ 2 n n\ 

Hint: For the second relation, supply the missing even factors in the “numer¬ 
ator” and the “denominator” of ( 2 n — 1 )!! 

(b) Using (a) and Example 11.1.1, show that 


r(n+ §) 


(2n- 1)!! 


(2n)! 

2 2n n\ 




(c) Now use (b) to obtain the following result: 

w 3\ (271 + 1)! /— 

r ( n + 2 ) = 2 2 »+l n l ^ 

11.2. Using the result of Problem 11.1, show that 


( 2 „ + ,„)! = -L 2 »"«r („ +1 + 1 ) r (» + . 


Hint: Consider the two cases of even m (with m = 2k) and odd m (with 
m = 2k + 1 ) separately, and show at the end that both can be written as a 
single formula. 


11.3. Using the result 


show that 


Hint: Let n = —tm. 


lim 

n—*oo 



= e 


lim 

n—> 00 



= e 


-t 


11.4. (a) By using Equation (11.2) repeatedly, show that 

r(a + n) = (a + n — l)(a + n — 2) • • • (a + n — k)T(a + n — k). 

(b) Let k = n in the above equation to show that 

/ r(a + n) 

a(a + 1) • • • (a + n — 1) = . 

T(a) 

(c) Using (b) show that 

a(a _l)...( a _ n + l) = (_i r h++). 
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11.5. Show that 

nOO 

r(a;) = 2 / e _t f 2x_1 dt 
Jo 

and 

™-jfM DP* 

11.6. Find the following integrals in terms of the gamma function: 

(a) / 0 °° t 2x+1 e _a * 2 dt. (b) / 0 °° t 2x e~ at2 dt. 

11.7. Using onZy its integral representation, show that beta function is sym¬ 
metric under interchange of its arguments. 

11.8. Using the definition of the gamma function, show the justification for 
the frequently used equality 0! = 1. 


11.9. Show that 



\J\ + k 2 sin 2 t dt 


Vl + k 2 [E(k') - E (| 



where k! = k/V 1 + k 2 . Hint: Change ttos = 7r/2 — t and break up the 
interval of integration of the resulting integral into two. 

11.10. Show that the circumference of an ellipse of respective semi-major and 
semi-minor axes a and b is AaE{k) where k = \Ja 2, — b 2 /a. Verify that you 
get the expected result when a = b. 

11.11. (a) Expand the square roots in the definition of the elliptic integrals 
of the first and second kinds in powers of k 2 sin 2 t, and keep the first three 
terms. 

(b) Now integrate those terms to find an approximation to elliptic integrals 
for small k. 

(c) Substitute it/ 2 for tp to obtain approximation for the complete elliptic 
integrals. 

11.12. Use the result of Problem 11.11 to obtain Equation (11.21). 


11.13. Use the integral 



(2n — 1)!! 7r 
(2n)!! 2 


E(k) 

K(k) 


7r 
2 

7r 
2 


( oo 

r E 

'(2n- 1)!!' 
(2n)!! 

( oo 

1+E 

L n= 1 

'(2n- 1)!!' 
(2n)!! 


2 k 2n 1 
2n- 1 | ’ 


to show that 
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11.14. Show that E( 0) = I\{ 0) = 7r/2, and that E( 1) = 1, K( 1) = oo. 

11.15. Use the ratio test on the hypergeometric series to determine its radius 
of convergence. 

11.16. Verify that the complete elliptic integral of the first kind is related to 
the hypergeometric function as follows: 

K{k) = ^F{\,\- l;fc 2 ). 

11.17. Show that ln(l + x) = xF( 1,1; 2; — x ). 

11.18. Use the result of Problem 11.4 to express Equation (10.15) of Chapter 
10 in terms of the gamma function; then show that 

(1 + l) “ = £ jySiiTT)'= ■ F <-“' * ft -*> 

71=0 V 7 V 7 


for arbitrary /?. 


11.19. By using integral representations: 
(a) Show that 


B{a, b ) = T ? ) F ( a ’ r;a + b + r; 1), 

r(a + b + r) 

where B is the beta function and r is any real number. Choose r appropriately 
and show that 

B(a, b) = —F(a, 1 — 6; a + 1; 1). 


(b) Also prove that 


F(a,/3;7;1) 


r(7)r(7-<*-/?) 

r(7-a)r(7-/?)' 


11.20. Expand the integrand of erf(x) in its Maclaurin series and use 


2n + 1 — 2(n -b 


r(l) 

r(|) 


to show that 


erf (a:) = 

\/7T 


>, 2 ’ 2 ’ 


-X 2 ). 


11.21. Using the same procedure as in Example 11.2.1, show that 


J- 1/2 



cos a:. 
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11.22. Show that 

11 m 

k\T(m + k) (k — l)!r(m + k + 1) fc!r(m + fc+l) 
and use it to derive Equation (11.35). 

11.23. Derive Equation (11.36). 

11.24. Find < 73 / 2 ( 2 ) and J_ 3/2(2). Hint: Use Equation (11.37). 
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Vectors and Derivatives 


One of the basic tools of physics is the calculus of vectors. A great variety 
of physical quantities are vectors which are functions of several variables such 
as space coordinates and time, and, as such, are good candidates for mathe¬ 
matical analysis. We have already encountered examples of such analyses in 
our treatment of the integration of vectors as in calculating electric, magnetic, 
and gravitational fields. However, vector analysis goes beyond simple vector 
integration. Vectors have a far richer structure than ordinary numbers, and, 
therefore, allow a much broader range of concepts. 

Fundamental to the study of vector analysis is the notion of field, with 
which we have some familiarity based on our study of Chapters 1 and 4. 

Fields play a key role in many areas of physics: In the motion of fluids, in the 
conduction of heat, in electromagnetic theory, in gravitation, and so forth. All 
these situations involve a physical quantity that varies from point to point as 
well as from time to time, 1 i.e., it is a function of space coordinates and time. 

This physical quantity can be either a scalar, in which case we speak of a 
scalar field, or a vector, in which case we speak of a vector field. There are scalar and vector 
also tensor fields, which we shall discuss briefly in Chapter 17, and spinor fields 
fields, which are beyond the scope of this book. 

The temperature of the atmosphere is a scalar field because it is a function 
of space coordinates—equator versus the poles—and time (summer versus 
winter), and because temperature has no direction associated with it. On the 
other hand, wind velocity is a vector field because (a) it is a vector and (b) its 
magnitude and direction depend on space coordinates and time. In general, 
when we talk of a vector field, we are dealing with three functions of space 
and time, corresponding to the three components of the vector. 


'In many instances fields are independent of time in which case we call them static 
fields. 
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12.1 Solid Angle 

Before discussing the calculus of vectors, we want to introduce the concept of 
a solid angle which is an important and recurrent concept in mathematical 
physics, especially in the discussion of vector calculus. 

12.1.1 Ordinary Angle Revisited 

We start with the concept of angle from a new perspective which easily gener¬ 
alizes to solid angle. Consider a curve and a point P in a plane. The point P 
is taken to lie off the curve [Figure 12.1(a)]. An arbitrary segment of the curve 
defines an angle which is obtained by joining the two ends of the segment to 
P. In particular, an element of length along the curve defines an infinitesimal 
angle. We want to relate the length of this element to the size of its angle 
measured in radians. 

Connect P to the midpoint of the infinitesimal line element of length A l, 
and call the resulting vector R with the corresponding unit vector e# as shown 
in Figure 12.1(a). 2 Let the angle between e# and the unit normal 3 to the 
length element e„ be a. As shown in the magnified diagram of Figure 12.1(b), 
a is also the angle between the line element QQ' and the line segment obtained 
by dropping a perpendicular QH onto the ray PQ'. It is clear from the 
diagram that 

QH = QQ 1 cos a => QH = A l cos a = Al e# • e„. 

Now recall that the measure of an angle in radians is given by the ratio of 
the length of the arc of a circle subtended by the angle to the radius of the 
circle, and this measure is independent of the size of the circle chosen. To 
find the measure of Ad in radians, let us choose a circle of radius R = |R|, 



Figure 12.1: Defining angles as ratios of lengths. 

2 In actual calculations, it is convenient to denote the position vector of P by r, say, and 
that of the midpoint by r'. Then R = r' r. 

3 There are two possible directions for this unit normal: one as shown in Figure 12.1, 
and the other in the opposite direction. As long as we deal with open curves (no loops) 
this arbitrariness persists. 
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the distance from P to the midpoint of the line element. The arc of this circle 
subtended by Ad is CC' , and the figure shows that the length of this arc is 
very nearly equal to QH. One can think of CC' as the projection of the line 
element onto the circle. Thus, 

Afl ~ 9K - 

R R 

If we denote the location of P by r and that of A l by r', then 


and we obtain 


A 0 = 


AZ(r') e n ■ (r' — r) 


T -r 


( 12 . 1 ) 


where we have emphasized the dependence of A l on r'. 

For a finite segment of the curve, we integrate to obtain the angle. This 
yields 


dl e# • 


dl{r')e n • (r' - r) 


R 


( 12 . 2 ) 


angle as integral 


where a and b are the beginning and the end of the finite segment. There is a 
way of calculating this finite angle which, although extremely simple-minded, 
is useful when we generalize to solid angle. Since the size of the circle used to 
measure the angle is irrelevant, let us choose a single fiducial circle of radius a 
centered at P (see Figure 12.2). Then, as we project elements of length from 
the curve, we obtain infinitesimal arcs of this circle with the property that 


dl &FL * On dl c 
R a 


where dl c is the element of arc of the fiducial circle. From this equation, we 
obtain 


6 = 



dR 

a 



(12.3) 


where a ' and b' are projections of a and b on the circle, and s is the length of 
the arc from a' to b' . This last relation is, of course, our starting point where 
we defined the measure of an angle in radians! 

Of special interest is the case where the curve loops back on itself. For 
such a case, the direction of e n is predetermined by 


Box 12.1.1. ( Convention ). We agree that for angle calculations, the 
unit normal shall always point out of a closed loop. 
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Figure 12.2: Total angle subtended by a closed curve about a point (a) inside and (b) 
outside. 


total angle at a 
point subtended 
by a closed curve 


If P happens to be inside the loop [Figure 12.2(a)], the total angle, corre¬ 
sponding to a complete traversal of the loop, is 


. s 2na 

8 = — = -= 27t. 

a a 


When P is outside, we get 8 = 0. This can be seen in Figure 12.2(b) where the 
projection of the closed curve covers only a portion of the fiducial circle and it 
does so twice, once with a positive sign—when e p and e„ are separated by an 
acute angle—and once with a negative sign—when ep and e„ are separated 
by an obtuse angle. Let us denote by 8p the total angle subtended by the 
closed curve C about a point P and by U the region enclosed by C. Then, 
we have 



if P is in U, 
if P is not in U. 


(12.4) 


Example 12.1.1. Point P is located outside a rectangle of sides 2 a and 2b as 
shown in Figure 12.3. We want to verify Equation (12.4). The integration is nat¬ 
urally divided into four regions: right, top, left, and bottom. We shall do the 



Yo 

2b -- 


Figure 12.3: Total angle subtended by a rectangle about a point outside. 
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right-hand-side integration in detail, leaving the rest for the reader to verify. For 
the right side we have r = yo& y , r' = ae x + y'e y , and 

dl = +dy\ R = r' - r = (a,y — t/o), e„ = e x . 


Therefore, 


dd r 


dl eu ■ Gji 

R 


dy' R • e x a dy' 

R 2 a 2 + ( y' — yo) 2 ’ 


and the total integrated angle for the right side is 


=z CBP 


6 r = a 


I. 


dy' 


= tan 


7T 

= --a- 


—b a 2 + iv' ~ Vo ) 2 

(|-/?) =/3-a- 


yo + b 


— tan 


yo-b 


Similarly, one can easily show that 6t = —2/3, 9i = /3 — a, and 6b = 2a, where t 
stands for “top,” l for “left,” and b for “bottom.” The total subtended angle is, 
therefore zero, as expected. Note that only for the top side is the angle between e„ 
and e_R obtuse, and this fact results in the negative value for 6t . _ 


The purpose of the whole discussion of the ordinary angle in such a high¬ 
brow fashion and detail has been to lay the ground work for the introduction 
of the solid angle. As we shall see shortly, a good understanding of the new 
properties of the ordinary angle discussed above makes the transition to the 
solid angle almost trivial. 


12.1.2 Solid Angle 

We are now ready to generalize the notion of the angle to one dimension 
higher. Instead of a curve we have a surface, instead of a line element we have 
an area element, and instead of dividing by R we need to divide by R 2 . This 
last requirement is necessary to render the “angle” dimensionless. Referring 
to Figure 12.4, solid angle defined 



Figure 12.4: Solid angle as the ratio of area to distance squared. 
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Box 12.1.2. 

A a as 


We define the solid angle subtended by the element of area 



where e n is the unit normal to the surface and Aa = e n Aa(r'). 


The numerator is simply the projection of Aa onto a sphere of radius R 
as Figure 12.5 shows. This projection is obtained by the intersection of the 
fiducial sphere and the rays drawn from P to the boundary of Aa. As in the 
case of the angle, the choice of fiducial sphere is arbitrary. The integral form 
of the above equation is 

o f f e R -da f fR - da f f (r' - r) • da(r') ^ 0 c , 

a = JJ — = JJ — = JJ ir'-rp ■ (125) 

s s s 

where S is the surface subtended by the solid angle Cl. 


Box 12.1.3. ( Convention ). For any closed surface S, we take e n to be 
pointing outward. 


If we use a single fiducial sphere of radius b for all points of S , we obtain 

A 


Cl = 


da 

If 


i 
b 2 


da = 


b 2 1 


( 12 . 6 ) 


s b 


S b 


where Sb is the projection of S onto the fiducial sphere and A its area. This 
equation is the analog of Equation (12.3) and can be used to define the measure 



Figure 12.5: The relation between the e R ■ Aa and its projection on a fiducial sphere. 

















12.1 Solid Angle 


349 


of solid angles. In particular, if the surface S is closed and P is inside, then 
A will be the total area of the fiducial sphere and we get = 47 t b 2 /b 2 = An. 
When P is outside, we get equal amounts of positive and negative contribu¬ 
tions with the net result of zero. 

Theorem 12.1.2. Denote by D§> the total solid angle subtended by the closed 
surface S about a point P and by V the region enclosed by S. Then, 


Qp = 



if P is in V , 
if P is not in V. 


(12.7) 


Example 12.1.3. As an example of the calculation of the solid angle, consider a 
square of side 2 a with the point P located a distance zo from its center as shown in 
Figure 12.6. With r = (0,0, zo) and r' = ( xy 1 , 0), we have R = r' —r = (x', y', — zo), 
and assuming that e„ points in the negative z-direction, 4 * we have 

_ dae„ ■ eR _ dx' dy' (—e z ) ■ R _ zo dx 1 dy' 

R 2 R 3 (x ,2 + y ' 2 + Zq) 3/,z 

The solid angle is obtained by integrating this: 


Q , = zq 


= 2azo 


/ a na 

dx / 

-a J —a 

r 

J —a 


dy' 


(x 12 + y' 2 + Zg ) 3,/2 
dx' 


I-a \/ X 12 + a 2 + Zq {x ’ 2 + Zq ) 

An interesting special case is when zq = a. Then 


= 4tan 1 


z 0 \/2a 2 + Zq 


fl = 4 tan 


iV3a 2 J 


1- 


4 tan 


1 

73 


= 4(tt/6) = 2tt/3. 


The last result can also be derived in a simpler way. When zo = a, the point P will 
be at the center of a cube of side 2a. Since the total solid angle subtended about P 
is 47 t, and all six sides contribute equally, the solid angle subtended by one side is 
47 r/6. ■ 



Figure 12.6: The solid angle subtended by a square of side 2a. 

4 This assumption is not forced by any convention. It is chosen to make the final result 

positive. 


total solid angle at 
a point subtended 
by a closed surface 
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Example 12.1.4. Let us replace the square of the last example with a circle of 
radius a. We can proceed along the same lines as before. However, in this particular 
case, we note that the solid angle is in the shape of a cone which is one of the 
primary surfaces of the spherical coordinate system. Placing the origin at P and 
projecting the area on a fiducial sphere, of radius b say, we may write 


0 Ab 2irb 2 (l — cos a) 

2 = V = V 2 


2n(l — cos a), 


where Ab = 2nb 2 (l — cos a) is the area of the projection of the circle on the fiducial 

sphere. The half-angle of the cone is denoted by a with 

a zo 

=> cos a = —- 
zo 


tana = 


\Ja 2 + z. 


o 


The final result is 


fi = 2?r 1 - 


\Ja? + zl' 

It is instructive to obtain this result directly as in the previous example. 


zo 


( 12 . 8 ) 


12.2 Time Derivative of Vectors 


Scalar and vector fields can be subjected to such analytic operations as differ¬ 
entiation and integration to obtain new scalar and vector fields. The deriva¬ 
tive of a vector with respect to a variable (say time) in Cartesian coordinates 
amounts to differentiating each component: 


<9 A dA x , dA v „ 0A Z „ 
~dt = ~df Gx + ~df ey + ~df ez ' 


(12.9) 


In other coordinate systems, one needs to differentiate the unit vectors as well. 

In general, the derivative of a vector is defined in exactly the same manner 
as for ordinary functions. We have to keep in mind that a vector physical 
quantity, such as an electric field, is a function of space and time, i.e., its 
components are real-valued functions of space and time. So, consider a vector 
A which is a function of a number of independent variables ■ ■ ■ ,t m ). 

Then, we define the partial derivative as before: 


(9.A- 

— (ai,a 2 ,...,a n ) 

A(&i, . . • , 0>k H - 6 , • • • ? CLn ) A(ai, • • • 5 • • • •> &n) 

= inn-. 


( 12 . 10 ) 


^ *o e 

As immediate consequences of this definition, we list the following useful 
relations: 

d , . dA _ BB 

—— (A • B) — —— B + A 

ot k Btk ot k 

d .. BA . BB 

— (AxB) = — xB + Ax—. 

Btk Btk Btk 


( 12 . 11 ) 
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These relations can be used to calculate the derivatives of vectors when written 
in terms of unit vectors, keeping in mind that the derivative of a unit vector 
is not necessarily zero! Only Cartesian unit vectors are constant vectors, and 
for purposes of differentiation, it is convenient to write vectors in terms of 
these unit vectors, perform the derivative operation, and then substitute for 
e x , e yi and e, in terms of other—spherical or cylindrical—unit vectors. 


Example 12.2.1. A vector whose magnitude is constant is always perpendicular 
to its derivative. This can be easily proved as follows: 

d d 


A ■ A = const. 


dt k 


(A ■ A) — — —(const.) = 0. 
Otk 


On the other hand, the LHS can be evaluated using the second relation in Equation 
(12.11). This gives 


— f A ■ A) = — 

dt k ( ’ dt k 


A „ dA „ A <9 A 

+ A ' dt k ~ A dt k ' 


These two equations together imply that A and (d/dtk)( A) are perpendicular to 
one another. ■ 


An important consequence of the example above is that 


Box 12.2.1. A unit vector is always perpendicular to its derivative. 


Example 12.2.2. Newton’s second law for a collection of particles leads directly 
to the corresponding law for rotational motion. Differentiating the total angular 
momentum 

N 

L = J2 rk x pao 

k=i 

with respect to time and using the second law, Ffc = dp k /dt, for the fcth particle, 
we get 

dL N d N N 

-jjr = Y1 rfc x Pfc) = x p fc+ rk x Pit) = (°+ rfc x Ffc ) = T ’ 

k=i u k=i k =i 

where an overdot indicates the derivative with respect to time and in the last line 
we used the definition of torque and the fact that velocity r\ and momentum p fc 
have the same direction. ■ 

As a special case of the example above, we obtain the law of angular 
momentum conservation: 


Box 12.2.2. When the total torque on a system of particles vanishes, the 
total angular momentum will be a constant of motion. This means that 
its components in a Cartesian coordinate system are constant. 


Since the unit vectors in other coordinate systems are not, in general, constant, 
a constant vector has variable components in these systems. 


only Cartesian 
unit vectors are 
constant. 


angular 

momentum 

conservation 
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12.2.1 Equations of Motion in a Central Force Field 

When one discusses the central-force problems in mechanics, for instance in 
the study of planetary motion, one uses spherical coordinates to locate the 
moving object. Thus, the position vector of the object, say a planet, is given 
in terms of spherical unit vectors. Newton’s second law, on the other hand, 
requires a knowledge of the second time-derivative of the position vector. 

In this subsection we find the second derivative of the position vector of 
a moving point particle P with respect to time in spherical coordinates. The 
coordinates (r, 0, ip) of P are clearly functions of time. First we calculate 
velocity and write it in terms of the spherical unit vectors 


v = 


dr 

dt 






We thus have to find the time-derivative of the unit vector e r . The most 
straightforward way of taking such a derivative is to use the chain rule: 

de r de r dr de r dO de r dip -i9e r . <9e r 

dt dr dt 89 dt dip dt d6 ^ dip ’ 

where we have used the fact that the spherical unit vectors are independent 
of r [see Equation (1.39)]. We now evaluate the partial derivatives using (1.39) 
and noting that the Cartesian unit vectors are constant: 

Odd 

— (sin 9 cos ip) + e y — (sin 9 sin ip) + e z — (cos 9) 

= e x cos 9 cos i p + e y cos 9 sin ip — e z sin 9. (12.12) 


We are interested in writing all vectors in terms of spherical coordinates. A 
straightforward way is to substitute for the above Cartesian unit vectors, their 
expressions in terms of spherical unit vectors. We can easily calculate such 
expressions using the method introduced at the end of Chapter 1. We leave 
the details for the reader and merely state the results: 


e x — e r sin 9 cos ip + eg cos 9 cos ip — e v sin ip, 

e y = e r sin 9 sin ip + eg cos 9 sin ip + e v cos ip, (12.13) 

e z = e r cos 9 — eg sin 9. 


Substituting these expressions in the previous equation, we get 


de r 

~d9 


{e r sin 9 cos ip + eg cos 9 cos ip — e v sin ip) cos 9 cos ip 

+ (e r sin 9 sin ip + eg cos 9 sin ip + e v cos ip) cos 9 sin ip 
— (e r cos 9 — eg sin 9) sin 9, 


which simplifies to 


de 
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We could have immediately obtained this result by comparing Equation (12.12) 
with the expression for eg in Equation (1.39). The other partial derivative is 
obtained the same way: 


uc r . u . . . . u . . . . . u , . 

—— = e x — sin 9 cos p) + e v — sin 9 sin ip) + e z — (cos 0) 
a<p ay? ay? ay? 

= —e^ sin 9 sin ip + e y sin 9 cos ip 

= — (e r sin 9 cos ip + eg cos 9 cos p — e v sin <p) sin 9 sin ip 
+ (e r sin 9 sin y? + eg cos 9 sin p + e v cos p) sin 9 cos p 
= e,. sin 9. 


(12.15) 


Substituting this and Equation (12.14) in the expression for velocity, we obtain components of 

velocity in 

v = e r f+ r + p^^-\ = e r r + egrO + e^rpsinO. (12.16) s P he ( lcal 

\ sju i f coordinates 


To write the equations of motion, we need to calculate the acceleration 
which involves the differentiation of other unit vectors. The procedure out¬ 
lined for e r can be used to obtain the partial derivatives of the other unit vec¬ 
tors. We collect the result of such calculations, including Equations (12.14) 
and (12.15) in the following: 


_ n __ n 
dr ’ d9 


d p 

de v . . „ . 

—— = —e r sm 9 — eg cos 9. 

op 


(12.17) 


Similarly the time-derivatives of the unit vectors are given as follows: 


—rf = Oeg + p sin 9e v , 
at, 

= — 9e r + pcos9e ip , 
at, 

= —<2) sin 9e r — p cos 9eg. 
at. 


(12.18) 


Differentiating Equation (12.16) with respect to t, inserting (12.18) in the 
result, and collecting the components, we get 


d 2 v ~ /.. q2 - 2-2 n\ 

»=H = er V- re ~ rif sm V 

+ eg (r9 + -r- ( r9 ) — rp 2 sin 9 cos 9 
V at 


+ e v [rp sin 9 + rdp cos 9 + — (rp sin 9) I . 


(12.19) 


components of 
acceleration in 
spherical 
coordinates 
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central-force 
problem in 
spherical 
coordinates 


angular 
momentum is 
conserved in 
motions caused by 
central forces. 


One can use these expressions to write Newton’s second law in spherical 
coordinates. 

Now suppose that a particle (a planet) is under the influence of a central 
force, i.e., a force that always points toward, or away from, an origin (the 
Sun), and has a magnitude that is a function of the distance between the 
particle and the origin. This means that, in spherical coordinates, the force 
is of the form F = e r F(r). The second law of motion now yields 


d 2 r , d 2 r „ F(r) „ 

m nr = e ’- F(r) =* ip = **— = e ’' /(r) 

which, together with Equation (12.19), gives 


f — rd 2 — rip 2 sin 2 0 = /(r), 

fd + ~r{rd) — rip 2 sin 9 cos 9 = 0, 
at 

fp sin d + rdp cos 9 H - (rip sin 9 ) = 0. 

dt. 


( 12 . 20 ) 


These equations are the starting point of the study of planetary motion. 
We shall not pursue their solution at this point, but consider some of their 
general properties, using angular momentum conservation. Since the force 
has only an e r component, its torque vanishes: 


T = rxF = re r x ( F(r)e r ) = rF(r)e r x e r = 0. 


Therefore, by Box 12.2.2, the angular momentum of the particle relative to 
the origin is a constant vector. Equation (12.16) now yields 

L = r x (mv) = mre r x [e r f + egrd + e^rip sin d^j 

= mr 2 e r x ( egd + e v ip sin d) = mr 2 (e v d — egip sin d) 

= mr 2 d(—e x sin ip + e y cos ip) 

— mr 2 ip sin d(e x cos 9 cos p + e y cos 9 sin p — e- sin 9) 

- ‘ J-C.r T LyGy T L z & z , 

where L x , L y , and L z are the constant Cartesian components of angular 
momentum and m is the mass of the particle. Equating the components of 
this vectorial relation gives 

L x = — mr 2 (9 sin p + ip sin 9 cos 9 cos p ), 

L y = mr 2 {9 cos p — <^sin 9 cos 9 sin p), (12.21) 

L z = mr 2 p sin 2 9. 


p = 


9 • 2 

mr z sm ( 


The last equation gives 


( 12 . 22 ) 
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From all of these relations, we obtain 

L 2 = L 2 x + L 2 + Ll = m 2 r 4 9 2 + (12.23) 

sin 9 

Now suppose that we choose our coordinate axes so that initially , i.e., at 
t = 0 , both the position and the velocity vectors of the particle lie in the xy- 
plane. Since L is perpendicular to both r and v, it must be initially entirely 
in the ^-direction. Conservation of angular momentum implies that L will 
always be in the ^-direction. In particular, L 2 = L 2 . Substituting this in 
Equation (12.23) yields 

L 2 = m 2 r 4 9 2 + =► 0 = mW + - L 2 

sin 9 sin 9 

or 0 = m 2 r 4 9 2 + L 2 cot 2 9. Neither of the two terms on the RHS of this 
equation is negative. Thus, for their sum to be zero, each term must be zero. 
It follows that 

m 2 r 4 9 2 = 0 => 9 = 0 => 9 = const., 

L 2 cot 2 0 = 0 => cot 2 9 = 0 => 9 = ir/2, 

assuming that r / 0 and I/O. These relations hold for all times. Thus, 
the particle is confined to a plane, our a;y-plane, for eternity! This is why the 
planets do not wobble “up and down” out of their orbital planes . 5 

If we substitute n/2 for 9 and use (12.22) for (p in Equation (12.20), then 
the second and third relations are satisfied identically, and the first relation 
becomes 

f - -^-3 = f(r) (12.24) 

m z r 6 

which is a single differential equation in one variable. The general problem 
of a particle’s motion in three dimensions has reduced to a one-dimensional 
problem. 


12.3 The Gradient 

Analysis of vectors deals with the derivatives and integrals of vector fields. 
Because of its simplicity, we shall work in a Cartesian coordinate system at 
the beginning, and later generalize to other coordinates. 

In many situations arising in physics, rates of change of certain scalar 
functions with distance are of importance. For instance, the way potential 
energy changes as we move in space is directly related to the force producing 
the potential energy. Similarly, the rate of change—derivative—of the elec¬ 
trostatic potential with respect to distance gives the electrostatic field. The 
concept of gradient makes precise the vague notion of a derivative with respect 
to distance. 

5 Actually, the planets, due to the influence of other planets, do wobble out of their 
orbits. But this is a very small effect. 


proof that planets 
move in a plane 
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Figure 12.7: “Gradient" or differentiation with respect to distance in one dimension. 


notion of ordinary 

derivative 

reexamined 


notion of gradient 
analyzed 


Let us analyze the notion of differentiation with respect to distance, start¬ 
ing with one variable. In Figure 12.7, a function /(x) has an increment, A/, 
corresponding to a change Ax in x. If Ax is small enough, we can write 


A / 



Ax. 


This shows that (df /dx) x — Xo is a measure of how fast the function / is chang¬ 
ing at the point Xq. 

With one variable, there is no ambiguity in defining the derivative, because 
there is only one line along which we can change x, the only coordinate. With 
two or more variables, the situation is completely different, as illustrated 
in Figure 12.8. A point Pq = (xo,yo) in the xy-plane is shown with the 
corresponding value of the function, /(xo,yo) = zq. Out of the infinitude of 
points that are close to Pq and cause a change in the function, only three are 
shown. These indicate how the change in f(x,y) depends on the direction 
in which the neighboring point is located in relation to Pq. For example, if 
we move in the direction Pq Pi , there is very little change in f(x,y), but if 
we move in the direction P 0 -P 2 , we notice more change in the function, and 
if we move in the direction of P0P3, the change seems to be maximum. This 
maximum change, and the direction associated with it, is called the gradient. 



Figure 12.8: Gradient or differentiation with respect to distance is shown in two di¬ 
mensions. The gradient is a vector in the xy-plane. Do not think of the surface as a 
variation in height! It could represent, for instance, the temperature at various points 
of the xy-plane. 
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Let us use dr to denote the infinitesimal displacement vector 6 connecting 
Pq to a neighboring point in the xy-plane. If f(x, y) is differentiable, Equation 
(2.12) gives 


df 




dy , 


where dx and dy are the components of the displacement from Pq and df is 
(approximately) the change in / corresponding to the increments dx and dy. 
We can rewrite this equation as 

df = (V/) Po • dr = | V f\\dr \ cos 9 1 (12.25) 

where, by definition, 

(v/) - s (S’|) P> (i2 - 26) 


gradient in two 
dimensions 


is a vector in the xy-plane and 6 is the angle between this vector and dr. It 
is clear that df will be maximum when cos 6 = 1, that is, when dr is in the 
direction of V/. We conclude, therefore, that V/ gives the direction along 
which / changes most rapidly. The vector in Equation (12.26) is the gradient 
of / at P 0 . 

The notion of gradient can be generalized to three variables although it is 
harder to visualize than the two-variable case. In three dimensions we deal 
with a function f(x,y,z) —which cannot be plotted as in Figure 12.8—and 
ask which dr = ( dx,dy,dz ) maximizes the change in /. Once again, the 
three-dimensional version of Equation (12.25) shows that dr and 


V/ = 


d£ d£ d£\ 

dx' dy' dz / 


(12.27) 


gradient in three 
dimensions 


should be in the same direction for df to have a maximum. 


Definition 12.3.1. The gradient of a function f(x,y,z) is defined as 


V/ = e^ 


, df 

~ V dy 


df 

' dz ’ 


For the same small displacement |Ar|, the change in f is maximum when Ar 
is in the direction opVf. 

Example 12.3.1. As an example, let us find the gradient of the function 
V (x, y, z) = f(r) = f (V* 2 +y 2 + « 2 ) 

(which depends on r alone) at a point P with Cartesian coordinates (x,y, z). Using 
the chain rule, we have 


6 A better notation is Ar. However, since there is no difference between differential 
and increment of an independent variable, and since eventually we will be interested in 
differentials, we use the latter notation. 
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electrostatic field 
is perpendicular to 
surfaces of 
conductors 


derivation of the 
equation of a 
plane tangent to a 
surface 


W 


, dV . dV , dV 
~ Gx dx + Gy dy + dz 


dV_ dV_ dV\ 
dx ’ dy ’ dz / 





f( r ) 


(x,y,z) = f'{r) 


r 

r 


The last equality shows that, for functions that depend on r alone, the gradient is 
proportional to the position vector of the point P , i.e., it is radial. ■ 


Given a scalar function f{x,y,z), we can consider surfaces on which this 
function maintains a constant value. If that constant value is C , the surface 
will be described by f(x, y , z) = C. One can, in principle, solve for z as a 
function of x and y to find the explicit dependence of the function. However, 
we are interested in the implicit dependence given above. Now consider two 
points P\ and P 2 on the surface with coordinates (x, y, z) and (x + Ax, y + 
Ay, z + Az), respectively. We have 


f(x,y,z)= f{x + Ax,y + Ay,z + Az) =>• f(x, y, z) = f(x, y, z) + Af 

or 0 = Af « fyAa’ + yAy + fyA^, if the increments of coordinates are 
small. This relation shows that V/ is perpendicular to the displacement 
from Pi to P 2 . The same argument applies to a curve g(x,y) = C; i.e., the 
two-dimensional gradient is perpendicular to the displacement from Pi to P 2 , 
both being points on the curve. Since Pi and P 2 are completely arbitrary, we 
conclude that 


Theorem 12.3.2. The gradient V/ is perpendicular to all surfaces /( x, y, z) = 
C for different C’s. Similarly, V g is perpendicular to all curves g(x,y ) = C. 

For example, as we shall see later, the electrostatic held is the gradient of 
the electrostatic potential. Therefore, the electrostatic held is perpendicular 
to surfaces of constant potential such as conductors. 

Example 12.3.3. The perpendicularity property of the gradient can be used to 
find the equation of the tangent plane to a surface z = g(x,y) at a point P with 
coordinates (xo,yo, zo)- This surface can be written as 

f(x,y,z) = z-g(x,y) = 0. 


Then, the normal to the surface at P which is the same as the normal to the 
tangent plane at P -is the gradient of / at P: 



21 d£ d£ 
dx’ dy ’ dz 


p 


Og _dg_ \ 
dx ’ dy J p 


A point of the tangent plane at P is completely determined by the property that 
its displacement vector Ar from P should be perpendicular to the gradient at P (see 
Figure 12.9). If we denote the position vector of P by ro and that of the point on 
the plane by r = (x, y, z), then the equation of the tangent plane is given by 


(r — r 0 ) • (V/) P = 0 


-(x-x 0 )(^\ - (y - yo) ( jf-') +(z-z 0 ) = o 
\ox J p \dyJ p 
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Figure 12.9: The plane tangent to the surface z = g(x,y) at P. 


or 

Z - Zo = {x - Xo) {^) p + { y- yo) iM) p m 

It is convenient to introduce a differentiation operator which we shall use 
later. the del operator 

Definition 12.3.2. The symbol V can be thought of as a vector operator, 
called del or nabla, whose components are d/dx,d/dy, and d/dz. Thus, we 
can write 

Odd 

V = e x ——hey——he. — . (12.28) 

ox oy oz 

This vector operator V operates on differentiable functions and produces vec¬ 
tor fields. 


12.3.1 Gradient and Extremum Problems 


The gradient is very nicely used to find the maxima and minima of functions 
of several variables. A function /(x) of n variables x = ( x\ , X 2 ,..., x n ) has a 
local extremum (maximum or minimum) at a point a if its differential vanishes 
at that point for arbitrary dx: 


df = 


d}_ 

dxi 


, df 

dx i + —— 

dx 2 + • 

.+ M. 

dx 2 

a 

8x n 


dx n = (V/(a)) • dx = 0 


where 


V/ = ( ..., ) and dx= (dx 1 ,dx 2 ,...,dx n ). 


If the dot product of V/(a) and dx is to vanish for arbitrary dx, then V/(a) 
must be zero. Thus for / to have an extremum at a, we must have 


= 0, 


i = 1,2,..., n. 


V/( a) = 0 or 


df_ 

dxi 


a 


(12.29) 
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This is the generalization to n variables the familiar condition known from 
calculus. 

In many situations, there are auxiliary conditions or constraints imposed 
on the independent variables. For example, let Pi, Q, and P 2 be three points 
in space, with Pi and P 2 fixed but Q being allowed to move. Consider the 
path P 1 QP 2 consisting of straight line segments P\Q and QP 2 - What choice 
of Q gives the shortest path? If we denote the coordinates of Q by (x, y, z) 
and those of P\ and P 2 with obvious subscripts, then we have to find the 
extremum of 


f(x,y,z ) = \/{x- xi) 2 + (y - yi) 2 + (z - zi) 2 

+ V( x ~ x 2 ) 2 + (y - V 2) 2 + (z - z 2 ) 2 . 

So we set partial derivatives equal to zero and solve for (x, y, z). The answer, 
as expected, turns out to be the path for which Q lies on the line segment 
P 1 P 2 between Pi and P 2 - 

Now suppose we demand that Q lie on a sphere of radius a centered at the 
origin. Then the problem becomes extremizing f(x, y, z) with the constraint 
condition that 

g(x, y, z) = x 2 + y 2 + z 2 ~a 2 = 0. 

To solve this problem, we could solve for one of the variables of the constraint 
equation in terms of the other two, substitute in f(x,y,z), and solve the re¬ 
sulting two-variable problem. But there is a much more elegant way involving 
gradients, which we discuss now. 

Suppose that we want to find the extremum of a function /(x) of n vari¬ 
ables x = (xi, X 2 , ■ ■ ■, x n ) subject to the condition that x must lie on the 
hypersurface g(x) = 0. We cannot set V/ equal to zero because dx is no 
longer arbitrary. 

With constraint, dx is confined to the surface g(x) = 0. Now, the only 
n-dimensional vector which has a vanishing dot product with any dx on the 
constrained surface is (a multiple of) the normal to the surface. Therefore, if 
(V/) • dx is to be zero for dx lying on the surface, then V/ must be a multiple 
of the normal to the surface g(x ) = 0. But this normal is nothing but Vg. 
Therefore, if / is to have an extremum subject to the constraint g(x) = 0, 
then it must obey the following equation 

V/ = -A Vg or V/ + A Vg = 0, 


Lagrange 

multipliers 


where A is an arbitrary constant called the Lagrange multiplier. This 
equation shows that to find the extremum of the function / with constraint 
g(x) = 0, one can define the function F of n + 1 variables 

F( xi,X 2 , ... ,ar„; A) = f(xi,x 2 , ...,x n ) + \g(xi,x 2 , ■ ■ 
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and extremize it without constraint. Then we have 


OF 
dxi 
d F 
~d\ 


df . dg . 

- - bA—=0, i = 1,2,... ,n, 

OXi OXi 


g{ Xl,X 2 , ...,x n ) = 0. 


(12.30) 


The last equation is just the constraint condition, but it comes out conve¬ 
niently as one of the extremal equations of F. 

Example 12.3.4. A rectangular box is to be made out of a given amount A of 
material to have the largest volume. What dimensions should the box have? Here 
f(x,y,z ) = xyz, the volume, and g(x,y,z ) = 2xy + 2 xz + 2yz — A. Setting the 
components of the gradient of 


F(x, y, z; A) = xyz + 2\{xy + xz + yz — A/2 ) 


equal to zero yields four equations 

yz + 2X(y + z) = 0, 
xz + 2X(x + z) = 0, 
xy + 2X(x + y) = 0, 
2 (xy + xz + yz) — A = 0. 


Multiplying the first equation by x and the second equation by y and subtracting 
yields 

2A x(y + z) — 2Xy(x + z) = 0, or x = y. 

Similarly, from the second and third equations we get y = z. So, the box should be 
a cube. The last equation then gives 

6x 2 — A = 0, or x = y = z = 

Substituting this in any of the above equations involving A yields A = — | A/ 6. ■ 
The extremal problems may have several constraint equations such as 



g j (x 1 ,x 2 ,...,x n ) = gj(x) =0, j = 1,2, (12.31) 

We can “eliminate” the first constraint by replacing f(x i,£ 2 , • • •, x n ) with 
Ei(x; Ai) = /(x) + Aigi(x), 


where F\ has only m — 1 constraint equations. Now eliminate the second 
constraint by defining 

F 2 (x; Ai, A 2 ) = -Fi(x; Ai) + A 2 g 2 (x) = /(x) + Aigi(x) + A 2 ff 2 (x). 
Continuing, we can eliminate all constraints by defining 

m 

F(x; Ai, A 2 ,..., A m ) = /(x) + ^ A(12.32) 

3 =1 

whose unconstrained extremization yields the extremal equations. 
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Vectors and Derivatives 


12.1. Find directly the solid angle subtended by a disk of radius a at a point 
P on its perpendicular axis located a distance b from the center. 

12 .2. A closed curve p = 3a + a cos if in cylindrical coordinates bounds a 
region in the xy-\Aai\e. Find the solid angle subtended by this region at a 
point P on the 2 -axis a distance 2a above the xy-plane. 

12.3. Derive Equation (12.11). 

12.4. Show that when a moving particle is confined to a circle, its velocity is 
always perpendicular to its radius. If, furthermore, the speed of the particle 
is constant, then its acceleration is radial. 

12.5. Derive Equations (12.17) and (12.18). 

12.6. The vectors a and b are given by 


a = ue x + ve y , b = ve x — ue y 


(a) Write e a and in terms of Cartesian unit vectors. 

(b) Find the four vectors de a /du, de a /dv, de^/du, and de^/dv in terms of 
Cartesian unit vectors. 

(c) Express e x and e y in terms of e a and e^. 

(d) Express the four vectors de a /du, de a /dv, de^/du, and det/dv in terms 
of e a and e/,. 

(e) If u and v are functions of time, find de a /dt and de^/dt in terms of e a 
and efc. 

12.7. Derive Equation (12.19). 

12.8. Derive Equation (12.23). 

12.9. Show that (12.22) and the assumption 9 = 7 t /2 solve the last two 
equations of (12.20) and reduce the first one to (12.24). 

12.10. (a) Obtain the time derivatives of the cylindrical unit vectors: 


de p 

dt 


= ^V, 


de^ 

dt 


= ~<pe P , 


de z 

dt 


= 0 . 


(b) Use the result of (a) to show that if A is a vector written in terms of 
cylindrical unit vectors, then 


dA 

dt 





Apdp - 


dA., 

dt 



2 2 2 
X* y* z* 

12.11. A surface is given by —r H- 77 H- - = 1. Find the unit normal to 

a z 4 a z 2 a z 

the surface and the equation of the tangent plane at (a/ 2 , a, a). 
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12.12. The potential of a certain charge distribution is given by 

. . o y 2 x 2 

$>(x,y,z) = z + — + 


(a) Find the electric field E = at (3/-v/2,1,1/2) and show that it is 

normal to the surface 


(b) Show that the electric field is normal at every point of this surface. 

(c) Show that the electric field is normal at every point of the surface obtained 
by replacing 1 on the RHS of the last equation by any arbitrary constant. 


12.13. Show that V(/g) = (V f)g + /(V g) for any two (differentiable) func¬ 
tions / and g of (x, y, z). 

12.14. Consider the plane ax + by + cz = d and a point P = ( xo,yo,zo ) 
not lying in the plane. Use Lagrange multipliers to show that the parametric 
equation of the line passing through P that gives the minimum distance to 
the plane is 


r = r 0 +fn, where r = (x,y,z), r 0 = (x 0 , yo, z 0 ), n = (a,b,c). 

(12.33) 

From this deduce that the distance from P to the plane is 

|d - ax o - by 0 - cz 0 \ 

\/a 2 + b 2 + c 2 

Hint: Take the dot product of (12.33) with n and use the fact that n • r = d 
when the tip of r is in the plane. 

12.15. Consider the sphere (x — a) 2 + (y — b) 2 + (z — c) 2 = d 2 and a point 
P = (&o, 2 / 0 j zo) not lying on the sphere. Use Lagrange multipliers to show that 
the shortest line segment connecting P to the sphere is that which extends 
through the center of the sphere. 

12.16. For a vector A(r, t) that is a function of position and time, show that 

r)fik 

dA = (dr ■ V)A + —— dt. 


12.17. Find the gradient of 

u(x, y, z , x', y', z') = u(r - r') = |r - r'| m , 

first with respect to the components of r and then with respect to the com¬ 
ponents of r', and write the answer completely in terms of r and r'. What is 
the answer when m = — 1 ? 




Chapter 13 


Flux and Divergence 


A vector field is a function with direction, and because of this directional 
property, many new kinds of differentiation and integration can be performed 
on it. For instance, a vector field can be made to pierce a surface or an element 
thereof, and as it pierces that surface its variation from point to point can be 
monitored. This leads to one kind of differentiation and integration which we 
discuss next. The integration leads to the concept of the flux of a vector field, 
and the associated differentiation to the notion of divergence. 


13.1 Flux of a Vector Field 


The paradigm of the concept of flux is that of the velocity field of a fluid (see 
Figure 13.1). A small ring of area Aa is situated in the flow. How much fluid 
is passing through the ring per unit time? It is clear that the answer depends 
on the density of the fluid, 1 the speed of the fluid, the size of the area Aa, and 
also on the relative orientation of the direction of the flow and the unit normal 
to the area, denoted by e„. A little contemplation reveals that the amount of 
fluid of constant unit density passing through Aa is proportional to 2 


A (f> = v • e„ A a = v • Aa, 


(13.1) 


where A<f> is called the flux of v through Aa , and Aa is defined to be e„A a. 
If the ring is replaced by a large surface S then we have to divide the surface 
into small areas—not necessarily in the shape of a ring—and sum up the con¬ 
tribution of each area to the flux. In the limit of smaller and smaller areas 
and larger and larger numbers of such areas, we obtain an integral: 


N 

' = lim > v, 

Aa—*0 -A-' 

N —too i —1 


N 

Aa; = lim > v; 

Aa—*0 
N —too i —1 


Aa, = 


da, 


(13.2) 


where (f) is the total flux through S. 

1 For simplicity we assume that density is constant and we take it to be 1. 

2 We shall come back to a rigorous derivation of the flow of a substance through a small 
loop later (see the discussion after Theorem 13.2.2). 


flux of flow 
velocity through 
small area 


total flux of flow 
velocity through 
large area 
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Figure 13.1: Flux of velocity vector through a small area A a. 


the total flux can 
be determined 
only up to a sign. 


orientable surface 


Mobius band 


There is an arbitrariness in the direction of the unit vector normal to an 
element of area, because for any unit normal, there is another which points in 
the opposite direction. The flux for these two unit normals will have opposite 
signs. This may appear as if one could arbitrarily choose every one of the 
unit normals e ni in the sum (13.2) to have either one of the two opposite 
orientations, leading to an arbitrary result for the integral. This is not the case, 
because the direction of the unit normal to an element of area is determined 
by the neighboring unit normals and the requirement of continuity. So, once 
the choice is made between the two possibilities of the unit normal for one 
element of area of the surface S, say the first one e ni , the second one can 
differ only slightly from e ni —in particular, it cannot be of opposite sign. The 
third one should point in almost the same direction as the second one, and 
so on. This requirement of continuity will uniquely determine the remaining 
unit normals. However, the initial choice remains arbitrary, and since the 
two orientations of the initial choice differ by a sign, the two total fluxes 
corresponding to these two orientations will also differ by a sign. We shall see 
shortly, however, that for closed surfaces, such an arbitrariness in sign can be 
overcome by convention. 

The discussion above works for orientable surfaces. This means that on 
any closed loop entirely on the surface, the direction of a normal vector will 
not change when one displaces it on the loop continuously one complete orbit. 
It is clear that the lateral surface of a cylinder is orientable. 

A cylinder is obtained by glueing the two edges of a rectangle. Now take 
the same rectangle and twist one of the (smaller) edges before glueing it to the 
opposite edge. The result—which the reader may want to construct—is a very 
famous mathematical surface called the Mobius band. A Mobius band is not 
orientable, because if one starts at the midpoint of the glued edges and moves 
perpendicular to it along the large circle (length of the original rectangle), 
then a unit normal displaced continuously and completely along the circle 
will be flipped. 3 In this book we shall never encounter nonorientable surfaces. 

3 The reader is urged to perform this surprising experiment using a (portion of a) tooth¬ 
pick as a unit normal. 
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Example 13.1.1. Consider the flow of a river and assume that the velocity of the 
water is given by 

v = "o( 1 -^)e„ 

where x is the distance from the midpoint of the river and w is the width of the 
river. Let us find the flux of the velocity, assuming that the cross section of the river 
is a rectangle with depth equal to h, as shown in Figure 13.2. 

The normal to the area da is perpendicular to the rry-plane and is in the same 
direction as the velocity. Thus, we have v • da = v da = v dx dy, and 


4 > = 






dx = hv o (w — i™) = 


dx 

%Av o, 


where S is the cross section of the river and A is its area. 


The concept of flux, although indicative of a flow, is not limited to the 
velocity vector field. We can define the flux of any vector field A in exactly 
the same way: 


flux can be defined 
not only for 
velocity, but for 
any vector field. 


= 



(13.3) 


Whether such a definition is useful or not should be determined by experi¬ 
ment. It turns out that the flux of every physically relevant vector field is 
not only useful, but essential for the theoretical—as well as experimental— 
investigation of that field. For example, the flux of a gravitational field 
through a closed surface is related to the amount of mass in the volume 
enclosed in the surface. Similarly, the rate of change of the flux of a magnetic 
field through a surface gives the electric field produced at the boundary of the 
surface. 



Figure 13.2: The river with its cross section. 
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Figure 13.3: The flux of the electric field through a circle. The normal unit vector e„ 
could be chosen to be either up or down. We choose (quite arbitrarily) the up direction 
to make the flux positive for positive q. 


for a closed 
surface, one can 
uniquely 
determine the 
direction of 
normal at each 
point of the 
surface. 


out is positive! 


Example 13.1.2. Consider the flux of the electric held of a point charge located 
at a distance d from the center of a circle of radius a as shown in Figure 13.3. The 
element of flux is given by 


E • da = |E| cos 6da = |E| cos Opdpdp 


kqd 

— —pdpdp 


kqd 

(d 2 +p 2 ) 3 / 2 


pdpdip, 


where e n is chosen to point up. The polar coordinates ( p , ip) are used to specify a 
point in the plane of the circle at which point the element of area is p dp dp. To find 
the total flux, we integrate the last expression above: 

* = ll wr$W , ' dl ’ dv = tqd L WTPW 

S 

= 2 nkqd{-(d> + pY 1/2 [} =2 *kq (f - ^= 1 =) . 


Note that since d represents a distance, as opposed to a coordinate, it is always 
positive and d = \fd? = |d|. ® 

It is often necessary to calculate the flux of a vector field through a closed 
surface bounding a volume. Intuitively, such a flux gives a measure of the 
strength of the source of the vector field in the volume. For instance, the flux 
of the velocity field of water through a closed surface bounding a fountain 
measures the rate of the water output of the fountain. If the surface does 
not enclose the fountain, the net flux will be zero because the flux through 
one “side” of the closed surface will be positive and that of the other “side” 
will be negative with the total flux vanishing. In the case of an electrostatic 
field, the flux through a closed surface measures the amount of charge in the 
volume bounded by that surface. The sign of the flux requires an orientation 
of the bounding surface which is equivalent to the assignment of a positive 
direction to the unit normal to the surface at each of its points. We agree to 
adhere to the convention of Box 12.1.3. 4 

4 Only orient.able surfaces can have a well defined orientation. Since we are excluding 
nonorientable surfaces from this book, all our surfaces respect Box 12.1.3. 
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Example 13.1.3. Let us consider the flux through a sphere of radius a centered 
at the origin of a vector field A given by A = kQr m e r with k a proportionality 
constant and Q the strength of the source. Assuming that the outward normal is 
considered positive (see Box 12.1.3) the total flux through the sphere is calculated 
as 


(j>Q = JJ A ■ da = JJ kQa m e r ■ (e r a 2 sin 9 dd dip) 


s s 

C 2iv 


= kQ 


n Z7T /* 71 

/ dip 
Jo Jo 


a™a z sin 0 dd = 2nkQa rn+ ' J I sin 6 dd = AirkQa 


f 7r 

/ sir 

Jo 


m+2 


It is important to keep in mind that when calculating the flux of a vector field, one 
has to evaluate the field at the surface. That is why a appears in the integral rather 
than r. Notice how the flux depends on the radius of the sphere. If m + 2 > 0, then 
the farther away one moves from the origin, the more total flux passes through the 
sphere. On the other hand, if m + 2 < 0, although the size of the sphere increases, 
and therefore, more area is available for the field to cross, the field decreases too 
rapidly to give enough flux to the large sphere, so the flux decreases. The important 
case of m = — 2 eliminates the dependence on a: The total flux through spheres of 
different sizes is constant. This last statement is a special case of the content of the 
celebrated Gauss’s law. ■ 


remember to 
evaluate the 
vector field at the 
surface when 
calculating its 
flux! 


Space vectors were conceived as three-dimensional generalizations of complex num¬ 
bers. The primary candidates for such a generalization however turned out to be 
quaternions—discovered by Hamilton —which had four components. One could nat¬ 
urally divide a quaternion into its “scalar” component and its vector component, 
the latter itself consisting of three components. The product of two quaternions, 
being itself a quaternion, can also be divided into scalar and vector parts. It turns 
out that the scalar part of the product contains the dot product of the vector parts, 
and the vector part of the product contains the cross product of the vector parts. 
However, the full product contains some extra terms. 

Physicists, on the other hand, were seeking a concept that was more closely 
associated with Cartesian coordinates than quaternions were. The first step in this 
direction was taken by James Clerk Maxwell. Maxwell singled out the scalar and the 
vector parts of Hamilton’s quaternion and put the emphasis on these separate parts. 
In his celebrated A Treatise on Electricity and Magnetism (1873) he does speak of 
quaternions but treats the scalar and the vector parts separately. 

Hamilton also developed a calculus of quaternions. In fact, the gradient operator 
introduced in Definition 12.3.2 and its name “nabla” were both Hamilton’s inven¬ 
tion. 5 Hamilton showed that if V acts on the vector part v of a quaternion, the 
result will be a quaternion. Maxwell recognized the scalar part of this quaternion 
to be the divergence (to be discussed in the next section) of the vector v, and the 
vector part to be the curl (to be discussed in the Section 14.2) of v. 

Maxwell often used quaternions as the basic mathematical entity or he at least 
made frequent reference to quaternions, perhaps to help his readers. Nevertheless, 
his work made it clear that vectors were the real tool for physical thinking and not 
just an abbreviated scheme of writing, as some mathematicians maintained. Thus 

5 He used the word “nabla” because V looks like an ancient Hebrew instrument of that 


name. 
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by Maxwell’s time a great deal of vector analysis was created by treating the scalar 
and vector parts of quaternions separately. 

The formal break with quaternions and the inauguration of a new independent 
subject, vector analysis, was made independently by Josiah Willard Gibbs and Oliver 
Heaviside in the early 1880s. 


13.1.1 Flux Through an Arbitrary Surface 

It may be useful to have a general formula for calculating the flux through 
an arbitrary surface whose equation is given in parametric form in Cartesian 
coordinates. Let 


x=f(u,v), y = g(u,v), z = h{u,v), (13.4) 


be the parametric equation of a surface. When v is held fixed and u is allowed 
to vary, a curve is traced on the surface whose infinitesimal displacement can 
be written as [see Equation (6.63)] 


,r ~ df , 

dli = e x — du 
ou 


~ dg 
e y — du - 
ou 


„ dh , 
e z —du. 
ou 


Similarly infinitesimal displacement along curves of constant u is 


din = e 


d£ 

dv 


dv - 


dg 

'dv 


dv - 


dh 
' dv 


dv. 


The cross product of these two displacements is the element of area of the 
surface: 



( g x e y 

eA 


( & x By 

e z \ 

da = dli x dl -2 = det 

££ 

ou ou 

dh 

~5u 

dudv = det 

dx dy 
c)u chi 

dz 

Thu 


If 

dh i 


K7T Tf 1 
\ ov ov 

¥) 

OV' 


Using this in (13.3) we get 


/ A X Ay AA 


(j) = 



dx 9y dz 
chu chu chu 


du dv, 


R 



(13.5) 


where A x , A y , and A z are considered functions of u and v obtained by substi¬ 
tuting (13.4) for their arguments. Equation (13.5) is an integral over a region 
R in the uv-plane determined by the range of the variables u and v sufficient 
to describe the surface S. 

The special, but important case, of a surface given by z = f(x, y) deserves 
special attention. In this case the parametrization is 


x = u, ,y = v, z = f(u,v) 
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and (13.5) yields 


/ A x Ay A z \ 


det 


R 


1 0 
0 1 


or, writing ( x, y ) for ( u , v) 


dz 

flu 

dz I 
flv'l 


du dv 


dz dz 

Ay d^ + Az * dxdy ' 


where R is the projection of the surface S onto the xy-plane. 


(13.6) 


13.2 Flux Density = Divergence 

The connection between flux and the strength of the source of a vector field 
was mentioned above. We now analyze this connection further. The variation 
in the strength of the source of a vector field is measured by the density of 
the source. For example, the variation in the strength—concentration—of 
the source of electrostatic (gravitational) field is measured by charge (mass) 
density. We expect this variation to influence the intensity of flux at various 
points in space. 


13.2.1 Flux Density 

Densities are physical quantities treated locally. A local consideration of flux, 
therefore, requires the introduction of the notion of flux density: 


Box 13.2.1. Take a small volume around a point P, evaluate the total flux 
of a vector field through the hounding surface of the volume, and divide 
the result by the volume to get the flux density or divergence of the 
vector field at P. 


notion of flux 
density and 
divergence of a 
vector field 
introduced 


We denote the flux density by for the moment. Later we shall introduce 
another notation which is more commonly used. 

Let us quantify the discussion above for a vector field A. Consider a small 
rectangidar 6 volume AV centered at P with coordinates ( x,y,z ). Let the 
sides of the box be Ax, Ay, and Az as in Figure 13.4. We are interested in 

6 The rectangular shape of the volume is not a restriction because it will be made smaller 
and smaller at the end. In such a limit, any volume can be built from—a large number of— 
these small rectangular boxes. Compare this with the rectangular strips used in calculating 
the area under a curve. 
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Figure 13.4: The flux of the vector field A through a closed infinitesimal rectangular 
surface. 

the net outward' flux of the vector field, A(x, y, z). The six faces of the box 
are assumed to be so small that the angle between the normal to each face 
and the vector field A is constant over the area of the face. Since we are 
calculating the outward flux, we must assume that e n is always pointing out 
of the volume. 

The total flux A<f> through the surface can be written as 

A <j> = (A 0i + A 4> 2 ) + (A03 + A <j>i) + (A0 5 + A<£ 6 ), 

where each pair of parentheses indicates one coordinate axis. For instance, 
A(j)\ is the flux through the face having a normal component along the positive 
x-axis, A 02 is the flux through the face having a normal component along the 
negative x-axis, and so on. Let us first look at A</>i, which can be written as 

A0i = Ai • e ni Aoi 

or, since e m is the same as e x , 


A0i = Ai • e^Aai = A lx Aai. 


This requires some explanation. The subscript 1 in Ai x indicates the evalu¬ 
ation of the vector field at the midpoint 8 of the first face. The subscript x 
in Ai x , of course, means the x-component. So, A\ x means the x-component 
of A evaluated at the midpoint of the first face; Aoq is the area of face 1 
which is simply AyAz (see Figure 13.4). The center of the box—point P— 
has coordinates (x, y, z) by assumption. Thus, the midpoint of face 1 will 
have coordinates (x + Ax/2, y, z). Therefore, 

/ Ax \ 

A(/i = A x ( x+ — ,y, zj AyAz. (13.7) 

'The choice of outward direction is dictated by Box 12.1.3. 

®The restriction to midpoint is only for convenience. Since the area is small, any other 
point of the face can be used. 
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The flux density that we are evaluating will be the density at P. Thus, 
as a function of the three coordinates, the result will have to be given at the 
coordinates of P, namely at (x,y,z). This means that in Equation (13.7), 
all quantities must have (x, y , z) as their arguments. This suggests expanding 
the function on the RHS of Equation (13.7) as a Taylor series about the point 
(x,y,z). Recall from Chapter 10 that 

00 BV-. f(r z) 

f{x + Ax,y + Ay,z + Az) = Y^ ^.fU! ’ (Ax) 1 (Ay) 1 (Az) k . 

We are interested only in the first power because the size of the box will 
eventually tend to zero. Therefore, we write this in the following abbreviated 
form: 


f(x + Ax, y + Ay, z + A z) 


df 


df 


df 


= f (*, y, z) + Ax— + Ay— + Az— + 


dy 


dz 


(13.8) 


where it is understood that all derivatives are evaluated at (x, y, z). Applying 
this result to the function on the RHS of Equation (13.7), for which Ay and 
A 2 are zero, yields 


A 


X 



A x (x,y,z) 


Ax dA x 
2 dx 


+ 0 + 0 + --- 


and 


A fa = < A x (x,y,z) + 


Aa’ dA x 


2 dx 


AyAz ■ 


Similarly, for the second face we obtain 


A<j >2 — Ao • e„ 2 Aa 2 — A 2 • (—e x ) Aa 2 — —A^xAyAz 


= -A x ( x - ) AyAz 


= ~ \ A x (x,y,z) - 


Ax 8A-, 


AyAz. 


2 dx 

Adding the expressions for A^i and A(f> 2 , we obtain 

A(f>i + A(j>2 

f Aa; dA x Ax dA x 

= \ A x {x, y, z) + —~Q^r - A x {x,y,z) + 


2 dx 


AyAz 


or 


f) A BA 

Adi + Ad 2 = -^-AxAyAz + ■■■= —-A-AV - 
dx dx 
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origin of the term 
“divergence" 


The reader may check that 


BA 

Afo + A(f>4 = —A A y 

By 

BA 

A(/) 5 + A(f> & = —AAV - 
Bz 

so that the total flux through the small box is 


(13.9) 


A0 = 


BA t BA„ 


Bx 


By 


3A Z 

Bz 


AV - 


The flux density, or divergence as it is more often called, can now be obtained 
by dividing both sides by AV and taking the limit as AV —> 0. Since all the 
terms represented by dots are of at least the fourth order, they vanish in the 
limit and we obtain 


Theorem 13.2.1. The relation between the flux density of a vector field and 
the derivatives of its components is 


p$ = div A = V • A = 


hm —— 

AV—>o AV 


BA X 3A y BA Z 
Bx By Bz 


The term “divergence,” whose abbreviation is used as a symbol of flux 
density, is reminiscent of water flowing away from its source, a fountain. In 
this context, the flux density measures how quickly or intensely water “di¬ 
verges” away from the fountain. The third notation V • A combines the 
dot product in terms of components with the definition of V as given in 
Equation (12.28). 9 


13.2.2 Divergence Theorem 

The use of the word (volume) density for divergence suggests that the total 
flux through a (large) surface should be the (volume) integral of divergence. 
However, any calculation of flux—even locally—requires a surface, as we saw 
in the derivation of flux density. What are the “small” surfaces used in the 
calculation of flux density, and how is the large surface related to them? The 
answer to this question will come out of a treatment of an important theorem 
in vector calculus which we investigate now. 

First consider two boxes with one face in common (Figure 13.5) and index 
quantities related to the volume on the left by a and those related to the one 
on the right by b. The total flux is, of course, the sum of the fluxes through 
all six faces of the composite box: 

A <j> = (A0i + Acf 2 ) + (A0 3 + A<£ 4 ) + (A (f> 5 + A (f 6 ), 

9 This notation is misleading because, as we shall see later, in non-Cartesian coordinate 
systems, the expression of divergence in terms of derivatives will not be equal to simply 
the dot product of V with the vector field. One should really think of V • A as a symbol , 
equivalent to p ^ or div A and not as an operation involving two vectors. 
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Figure 13.5: The common boundaries contribute no net flux. 

where, as before, A0i is the total flux through the face having a normal in 
the positive ^-direction, and A 02 that through the face having a normal in 
the negative ^-direction, and so on. It is evident from Figure 13.5 that 

A0i = A 0 ai + A0 fel , 

where A0 ai is the flux through the positive x face of box a and A0b 4 is the 
flux through the positive x face of box b. Using a similar notation, we can 
write 


A02 = A0 Q2 + A0b 2 , 

A05 + A06 = A 0 Q5 + A0b 6 + A0 ag + A 0b 6 . 

However, for the y faces we have A03 = A0b 3 and A04 = A0 a4 , because the 
face of the composite box in the positive //-direction belongs to box b and that 
in the negative (/-direction to box a. Now note that the outward flux through 
the left face of box b is the negative of the outward flux through the right face 
of box a; that is, 

A0b 4 = -A0 Q3 => A0 &4 + A0 a3 = 0. 

Thus, we obtain 

A03 + A04 = A0b 3 + A0 Q4 = A0 Q3 + A0b 3 + A0a 4 + A0b 4 . 

Using all the above relations yields 

A 0 = (A 0 Ql + A 0 Q2 ) + (A0 Q3 + A0 a J + (A0 Q5 + A0 ae ) 

+ (A0b x + ^^ 2 ) + (A0b 3 + ^bi) + (A 0b 5 + A0b e ) 

or A0 = A0 a + A0b, or A0 = (V • A) a AU a + (V ■ A)bAVb. These equations 
say that 

— 
Box 13.2.2. The total flux through the outer surface of a composite box 
consisting of two adjacent boxes is equal to the sum of the total fluxes 
through the bounding surfaces of the two boxes, including the common 
boundary. Stated differently, in summing the total outward flux of adjacent 
boxes, the contributions of the common boundary cancel. 


















376 


Flux and Divergence 


the very 
important 
divergence 
theorem 


It is now clear how to generalize to a large surface bounding a volume: Di¬ 
vide up the volume into N rectangular boxes and write 4> ss JT =1 (V • A)jAV). 
The LHS of this equation is the outward flux through the bounding surface 
only. Contributions from the sides of all inner boxes cancel out because 
each face of a typical inner box is shared by another box whose outward 
flux through that face is the negative of the outward flux of the original box. 
However, boxes at the boundary cannot find enough boxes to cancel all their 
flux contributions, leaving precisely the flux through the original surface. The 
use of the approximation sign here reflects the fact that N, although large, is 
not infinite, and that the boxes are not small enough. To attain equality we 
must make the boxes smaller and smaller and their number larger and larger, 
in which case we approach the integral: 


</> = 


III V Adv 

v 


(13.10) 


Then, using Equation (13.2), we can state the important 

Theorem 13.2.2. (Divergence Theorem). The surface integral (flux) of 
any vector field A through a closed surface S bounding a volume V is equal 
to the volume integral of the divergence (or flux density) of A: 



(13.11) 


Let A = cf where c is an arbitrary constant vector and / a function. 
Applying the divergence theorem to this A and using the readily verifiable 
identity V • (cf) = c • V/, we get 


H’ cda 

s 


Jj j c -(Vf)dV or 
v 



Since this holds for any c, we must have 


I! rd ^III vfdv 

S V 


(13.12) 


Example 13.2.3. In this example we derive Gauss’s law for fields which vary 
as the inverse of distance squared, specifically, gravitational and electrostatic fields. 
Let Q be a source point (a point charge or a point mass) located at Po with position 
vector ro and S a closed surface bounding a volume V. Let A(r) denote the field 
produced by Q at the field point P with position vector r as shown in Figure 13.6(a). 
We know that 


KQ 

|r- r 0 | 3 


(r 


ro). 


A(r) = 


(13.13) 
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The flux of A through S can be written immediately: 

KQ( r — r 0 ) ■ da 


JI A •*"//■ 


T- r o 


But the RHS is—apart from a constant—the solid angle subtended by S about Pq. 
Using Equation (12.7), we have 


11 


A • da = 


47 tKQ if Pq is in V, 

0 if Pq is not in V. 


(13.14) 


If there are N point sources Qi,Q 2 , ■ ■ ■, Qn, then A will be the sum of individual 
contributions, and we have 




KQkijCk - r 0 ) ■ da 


= K 


i>// 

l — 1 J J 


(r k - rp) ■ da 
jrj, - r 0 | 3 


|rfc - r 0 | 

N 


= K^2Q k n k , 


k= 1 


where f Ik is zero if Qk is outside V, and 47r if it is inside [see Figure 13.6(b)], Thus, 
only the sources enclosed in the volume will contribute to the sum and we have 


11 


A ■ da = AnKQ e 


(13.15) 


where Q enc is the amount of source enclosed in S. 

For electrostatics, K = k e = l/47reo, Q = q, and A = E, so that 


11 


E ■ da = genc/eo- 


For gravitation, K = —G, Q = M, and A = g, so that 


11 


g ■ da = -4nGM enc . 


(13.16) 


(13.17) 


global (integral) 
form of Gauss's 
law 
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local (differential) 
form of Gauss’s 
law 


The minus sign appears in the gravitational case because of the permanent attraction 
of gravity. Gauss’s law is very useful in calculating the fields of very symmetric source 
distributions, and it is put to good use in introductory electromagnetic discussions. 
The derivation above shows that it is just as useful in gravitational calculations. ■ 

Equation (13.15) is the integral or global form of Gauss’s law. We can also 
derive the differential or local form of Gauss’s law by invoking the divergence 
theorem and assigning a volume density pq to Q e nc : 


LHS 


III VAdV - 

v 


RHS = 4t tI< 


PQ dV. 


Since these relations are true for arbitrary G, we obtain 

Theorem 13.2.4. (Differential Form of Gauss’s Law). If a point source 
produces a vector field A that obeys Equation (13.13), then for any volume 
distribution pq of the source we have V ■ A = AttKpq. 

This can easily be specialized to the two cases of interest, electrostatics 
and gravity. 


13.2.3 Continuity Equation 

To improve our physical intuition of divergence, let us consider the flow of a 
fluid of density p(x,y,z,t) and velocity v(x,y,z,t). The arguments to follow 
are more general. They can be applied to the flow (bulk motion) of many 
physical quantities such as charge, mass, energy, momentum, etc. All that 
needs to be done is to replace p —which is the mass density for the fluid 
flow—with the density of the physical quantity. 

We are interested in the amount of matter crossing a surface area A a 
per unit time. We denote this quantity momentarily by AM, and because 
of its importance and wide use in various areas of physics, we shall derive 
it in some detail. Take a small volume AG of the fluid in the shape of a 
slanted cylinder. The lateral side of this volume is chosen to be instantaneously 
in the same direction as the velocity v of the particles in the volume. For 
large volumes this may not be possible, because the macroscopic motion of 
particles is, in general, not smooth, with different parts having completely 
different velocities. However, if the volume AG (as well as the time interval 
of observation) is taken small enough, the variation in the velocity of the 
enclosed particles will be negligible. This situation is shown in Figure 13.7. 
The lateral length of the cylinder is vAt where At is the time it takes the 
particles inside to go from the base to the top, so that all particles inside will 
have crossed the top of the cylinder in this time interval. Thus, we have 

amount crossing top = amount in AG = pAV. 

But AG = (vA t) ■ Aa = v • AaAf, where the dot product has been used 
because the base and the top are not perpendicular to the lateral surface. 
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Figure 13.7: The flux through a small area is related to the current density. 


Therefore, 


AM = 


amount crossing top 
At 


pv • Aa At 
At 


(pv) • Aa. 


The RHS of this equation is the flux of the vector field pv which is called the 
mass current density, and usually denoted as J. 

As indicated earlier, this result is general and applies to any physical 
quantity in motion. We can therefore rewrite the equation in its most general 
form as 

A<Pq — (PQ V ) ' Aa = Jq • Aa. (13.18) 

This is so important that we state it in words: 


Box 13.2.3. The amount of a flowing physical quantity Q crossing an 
area A a per unit time is the flux Jq ■ Aa. The current density Jq at each 
point is simply the product of volume density and velocity vector at that 
point. 


For a (large) surface S we need to integrate the above relation: 


<t>Q = J J (pqv) ■ da = J J J Q • da 
s s 

and if S is closed, the divergence theorem gives 


(13.19) 


= J J Jq- da = JJJVJQdV. (13.20) 

s v 

Let Q, which may change with time, denote the total amount of physical 
quantity in the volume V. Then, clearly 

Q(t)= f f f PQdV= [[[ p Q (r, t) dV (r), 


v 


v 


current density 


relation between 
flux and current 
density 
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global or integral 
form of continuity 
equation 

^ = "// jQ ' da ’ (13 - 21) 

S 

which is the global or integral form of the continuity equation. 

The minus sign ensures that positive flux gives rise to a depletion, and 
vice versa. The local or differential form of the continuity equation can be 
obtained as follows: The LHS of Equation (13.21) can be written as 

f = 5/// dv{t) =/// 

v v 

while the RHS, with the help of the divergence theorem, becomes 


where in the last integral we have emphasized the dependence of various quan¬ 
tities on location and time. Now, if Q is a conserved quantity such as energy, 
momentum, charge, or mass, 10 the amount of Q that crosses S outward (i.e., 
the flux through S) must precisely equal the rate of depletion of Q in the 
volume V. 

Theorem 13.2.5. In mathematical symbols, the conservation of a conserved 
physical quantity Q is written as 


Together they give 


or 



This relation is true for all volumes V. In particular, we can make the volume 
as small as we please. Then, the integral will be approximately the integrand 
times the volume. Since the volume is nonzero (but small), the only way that 
the product can be zero is for the integrand to vanish. 


local (differential) 
form of continuity 
equation 


Box 13.2.4. The differential form of the continuity equation is 


dpQ 

dt 


+ V • Jq — 0. 


(13.22) 


10 In the theory of relativity mass by itself is not a conserved quantity, but mass in 
combination with energy is. 
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Both integral and differential forms of the continuity equation have a wide 
range of applications in many areas of physics. 

Equation (13.22) is sometimes written in terms of pq and the velocity. 
This is achieved by substituting pqv for Jq: 

^ + V • (p Q v) = 0 
or 

^- + (Vpq) • V + PQ V • V = 0. 

However, using Cartesian coordinates, we write the sum of the first two terms 
as a total derivative: 


dpQ 

dt 


+ (v«)-v = ^a 

= dpQ 

dt 


! dpQ dpQ dpQ \ / dx dy_ dz\ 

\ dx ’ dy ’ dz / \dt’ dt’ dt / 
dpQ dx dpQ dy dpQ dz _ dpQ 
dx dt dy dt dz dt dt 

v ^ 

=total derivative=dpQ/cZt 


Thus the continuity equation can also be written as 


dpQ 

dt 


p Q V • v = 0. 


(13.23) 


Aside from Maxwell, two names are associated with vector analysis (completely 
detached from their quaternionic ancestors): Willard Gibbs and Oliver Heaviside. 

Josiah Willard Gibbs’s father, also called Josiah Willard Gibbs, was profes¬ 
sor of sacred literature at Yale University. In fact the Gibbs family originated in 
Warwickshire, England, and moved from there to Boston in 1658. 

Gibbs was educated at the local Hopkins Grammar School where he was de¬ 
scribed as friendly but withdrawn. His total commitment to academic work together 
with rather delicate health meant that he was little involved with the social life of 
the school. In 1854 he entered Yale College where he won prizes for excellence in 
Latin and mathematics. 

Remaining at Yale, Gibbs began to undertake research in engineering, writing a 
thesis in which he used geometrical methods to study the design of gears. When he 
was awarded a doctorate from Yale in 1863 it was the first doctorate of engineering 
to be conferred in the United States. After this he served as a tutor at Yale for 
three years, teaching Latin for the first two years and then Natural Philosophy in 
the third year. He was not short of money however since his father had died in 
1861 and, since his mother had also died, Gibbs and his two sisters inherited a fair 
amount of money. 

From 1866 to 1869 Gibbs studied in Europe. He went with his sisters and spent 
the winter of 1866-67 in Paris, followed by a year in Berlin and, finally spending 
1868-69 in Heidelberg. In Heidelberg he was influenced by Kirchhoff and Helmholtz. 

Gibbs returned to Yale in June 1869, where two years later he was appointed 
professor of mathematical physics. Rather surprisingly his appointment to the pro¬ 
fessorship at Yale came before he had published any work. Gibbs was actually 



Josiah Willard 
Gibbs 1839-1903 
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Oliver Heaviside 
1850-1925 


a physical chemist and his major publications were in chemical equilibrium and 
thermodynamics. From 1873 to 1878, he wrote several important papers on ther¬ 
modynamics including the notion of what is now called the Gibbs potential. 

Gibbs’s work on vector analysis was in the form of printed notes for the use of 
his own students written in 1881 and 1884. It was not until 1901 that a properly 
published version appeared, prepared for publication by one of his students. Using 
ideas of Grassmann, a high school teacher who also worked on the generalization of 
complex numbers to three dimensions and invented what is now called Grassmann 
algebra, Gibbs produced a system much more easily applied to physics than that of 
Hamilton. 

His work on statistical mechanics was also important, providing a mathematical 
framework for the earlier work of Maxwell on the same subject. In fact his last 
publication was Elementary Principles in Statistical Mechanics, which is a beautiful 
account putting statistical mechanics on a firm mathematical foundation. 

Except for his early years and the three years in Europe, Gibbs spent his whole 
life living in the same house which his father had built only a short distance from the 
school Gibbs had attended, the college at which he had studied, and the university 
where he worked all his life. 

Oliver Heaviside caught scarlet fever when he was a young child and this 
affected his hearing. This was to have a major effect on his life making his childhood 
unhappy, and his relations with other children difficult. However his school results 
were rather good and in 1865 he was placed fifth from 500 pupils. 

Academic subjects seemed to hold little attraction for Heaviside, however, and 
at age 16 he left school. Perhaps he was more disillusioned with school than with 
learning since he continued to study after leaving school, in particular he learnt the 
Morse code, and studied electricity and foreign languages, in particular Danish and 
German. He was aiming at a career as a telegrapher and in this he was advised 
and helped by his uncle Charles Wheatstone (the piece of electrical apparatus the 
Wheatstone bridge is named after him). 

In 1868 Heaviside went to Denmark and became a telegrapher. He progressed 
quickly in his profession and returned to England in 1871 to take up a post in 
Newcastle upon Tyne in the office of the Great Northern Telegraph Company which 
dealt with overseas traffic. 

Heaviside became increasingly deaf but he worked on his own researches into 
electricity. While still working as chief operator in Newcastle he began to publish 
papers on electricity. One of these was of sufficient interest to Maxwell that he men¬ 
tioned the results in the second edition of his Treatise on Electricity and Magnetism. 
Maxwell’s treatise fascinated Heaviside and he gave up his job as a telegrapher and 
devoted his time to the study of the work. Although his interest and understanding 
of this work was deep, Heaviside was not interested in rigor. Nevertheless, he was 
able to develop important methods in vector analysis in his investigations. 

His operational calculus, developed between 1880 and 1887, caused much con¬ 
troversy. Burnside rejected one of Heaviside’s papers on the operational calculus, 
which he had submitted to the Proceedings of the Royal Society, on the grounds that 
it “contained errors of substance and had irredeemable inadequacies in proof.” Tait 
championed quaternions against the vector methods of Heaviside and Gibbs and 
sent frequent letters to Nature attacking Heaviside’s methods. Eventually, however, 
his work was recognized, and in 1891 he was elected a Fellow of the Royal Society. 
Whittaker rated Heaviside’s operational calculus as one of the three most important 
discoveries of the late nineteenth Century. 
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Heaviside seemed to become more and more bitter as the years went by. In 1908 
he moved to Torquay where he showed increasing evidence of a persecution complex. 
His neighbors related stories of Heaviside as a strange and embittered hermit who 
replaced his furniture with granite blocks which stood about in the bare rooms like 
the furnishings of some Neolithic giant. Through those fantastic rooms he wandered, 
growing dirtier and dirtier, with one exception: His nails were always exquisitely 
manicured, and painted a glistening cherry pink. 


13.3 Problems 

13.1. Using (13.6) find the flux of the vector field A = kx 2 e z through the 
portion of the sphere of radius a centered at the origin lying in the first octant 
of a Cartesian coordinate system. 

13.2. Using (13.6) find the flux of the vector field A = ye x + 3 ze y — 2xe z 
through the portion of the plane x + 2y — 2>z = 5 lying in the first octant of a 
Cartesian coordinate system. 

13.3. A vector field is given by A = r. Using (13.6) find the flux of this 
vector field through the upper hemisphere centered at the origin. Verify your 
answer by calculating the flux using (the much easier) spherical coordinates. 

13.4. Find the flux of the vector field A = x 2 e x + y 2 e y + z 2 e z through the 
portion of the plane x + y + z = 1 lying in the first octant of a Cartesian 
coordinate system. 

13.5. Using (13.6), find the flux of the vector field A = fcr/r 3 through the 
upper hemisphere centered at the origin. Verify your answer by calculating 
the flux using spherical coordinates. 

13.6. Find the flux of the vector field A = ye y + ae z through the portion of 
the paraboloid 2 = b 2 — x 2 — y 2 above the a;y-plane. 

13.7. Derive Equation (13.9). 

13.8. Find the flux of the vector 

A 6 ka 2 y „ ^ 3 ka 2 z „ ^ 2ka 2 x 

y/x 2 + y 2 + a 2 yV + z 2 + 4 a 2 V V% 2 + z 2 + 9a 2 

through the surface of the box shown in Figure 13.8: 

(a) by integrating over the surface of the box; and 

(b) by using the divergence theorem and integrating over the volume of the 
box. 

13.9. The gravitational field of a certain mass distribution is given by 

g (x,y,z) = -kG {(x 3 y 2 z 2 )e x + ( x 2 y 3 z 2 )e y + ( x 2 y 2 z 3 )e z } , 

where k is a constant and G is the universal gravitational constant: 

(a) Find the mass density of the source of this field. 

(b) What is the total mass in a cube of side 2a centered about the origin? 
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Figure 13.8: The box of Problem 13.8. 


13.10. The gravitational field of a certain mass distribution in the first octant 
of a Cartesian coordinate system is given by 


g (x,y,z) 


GM rc- (x+y+z)/a , 


where r is the position vector, M and a are constants, and G is the universal 
gravitational constant. 

(a) Find the mass density of the source of this field. 

(b) What is the total mass in a cube of side a with one corner at the origin 
and sides parallel to the axes? 


13.11. The electrostatic potential of a certain charge distribution in Cartesian 
coordinates is given by 


$(* ,y,z) = xyze~( x+v+z)/a , 
a 6 

where Vq and a are constants. 

(a) Find the electric field E = — V<1> of this potential. 

(b) Calculate the charge density of the source of this field. 

(c) What is the total charge in a cube of side a with one corner at the origin 
and sides parallel to the axes? Write your answer as a numerical multiple of 
e 0 V 0 a. 

13.12. The electric field of a charge distribution is given by 

E = ^ xyze- {x+y+z)/a r. 
a 4 

(a) Write the Cartesian components of this electric field completely in Carte¬ 
sian coordinates. 

(b) Calculate the volume charge density giving rise to this field. 

(c) Find the total charge in a cube of side a whose sides are parallel to the axes 
and one of whose corners is at the origin. Write your answer as a numerical 
multiple of eoEoa 2 . 
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13.13. The velocity of a physical quantity Q is radial and given by v = kr 
where k is a constant. Show that if the density pQ is independent of position, 
then it is given by 

PQ(t) = PoQe~ 3kt 

where pqq is the initial density of Q. 




Chapter 14 

Line Integral and Curl 


Last chapter introduced the concept of flux and the surface integral associated 
with it. Flux uses the directional property of a vector field to have it pierce an 
element of area. The directional property can also naturally assign a varying 
direction along a line. One can consider how a vector field changes direction 
as it moves along a curve in space. This change can also lead to a new kind of 
integration and differentiation of vector fields. The integration leads to the no¬ 
tion of a line integral and the associated differentiation to the concept of curl. 


14.1 The Line Integral 

The prime example of a line integral is the work done by a force. Consider 
the force field F(r) acting on an object and imagine the object being moved 
by a small displacement Ar. Then the work done by the force in effecting this 
displacement is defined as 

AW = F(r) • Ar, 

where it is assumed that F(r) is (approximately) constant during the displace¬ 
ment. 

To calculate the work for a finite displacement, such as the one shown 
in Figure 14.1, we break up the displacement into N small segments, cal¬ 
culate the work for each segment, and add all contributions to obtain W « 
F(r$) • Ar,. The approximation sign can be removed by taking Ar, as 
small as possible and N as large as possible. Then we have 

W= / F(r) • dr = [ F ■ dr, (14.1) 

Jp 1 Jc 

where C stands for the particular curve on which the force is displaced. This 
equation is, by definition, the line integral of the force field F. In this partic¬ 
ular case it is the work done by F in moving from Pi to P%. Of course, we can 
apply the line integral to any vector field, not just force. In electromagnetic 


line integral 
defined 
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Figure 14.1: The line integral of a vector field F from Pi to P 2 . 


theory, for example, the line integrals of the electric and magnetic fields play 
a central role. 

The most general way to calculate a line integral is through parametric 
equation of the curve. Thus, if the Cartesian set of parametric equations of 
the curve is 

x = f(t), y = g(t), z = h(t), 

then the components of the vector field A will be functions of a single variable 
t obtained by substitution: 

A x (x,y,z ) = A x (f(t),g(t),h(t)) = J(t), 

A y (x,y,z) = A y (f(t),g(t),h(t)) = 3 (t), 

A z (x,y,z) = A z (f(t),g(t),h(t)) = %(t), 

and the components of dr are 

dx = f'{t) dt, dy = g'(t) dt, dz = h'(t) dt. 


line integral in 
terms of the 
parametric 
equations of the 
curve 


Then the line integral of A can be written as 

/ A • dr = / (A x dx + A y dy + A z dz) 

Jc Jc 


= I {?(t)f , (t) + S(t)g'(t)+M(t)h'(t)}dt, (14.2) 

J a 


where t = a and t = b designate the initial and final points of the curve, 
respectively. Other coordinate systems can be handled similarly. Instead of 
giving a general formula for these coordinate systems, we present an example 
using cylindrical coordinates. 

Example 14.1.1. Consider the vector field given by 


A = azipe p + C2pze v + C3pipe z , 

where ci, C2, and C3 are constants. We want to calculate the line integral of this 
field, starting at z = 0, along one turn of a uniformly wound helix of radius a whose 




14.1 The Line Integral 


389 



Figure 14.2: The helical path for calculating the line integral. 


windings are separated by a constant value b (see Figure 14.2 ). The parametric 
equation of this helix in cylindrical coordinates is 

P = /(l) = a, <p = g(t) =t, z= h(t) = -^t. 

Notice that as ip = t changes by 2-7T, the height (i.e., z) changes by b as required. 
Substituting for the three coordinates in terms of t in the expression for A, we obtain 

A = (ff(t), S(t),Jf(f)} = (^ci-^t 2 ,c 2 a-^-t,c 3 atj. 


Similarly, 


dr = (dp, pdip, dz) = (f'(t),f(t)g'(t),h'(t))dt = (o,a, dt, 

so that 


/ A • dr = ( { 3(t)f'(t) + 9(t)g'(t) + dt 

J C J a 


f{ 


0 + C 2 a 2 -^-t + c 3 -^-at [• dt = nab(c 2 ci + c 3 ). 
2tv 2tv 


Example 14.1.2. Consider the vector field A = K(xy 2 e x + x 2 ye y ). We want 
to evaluate the line integral of this field from the origin to the point (a, a) in the 
zy-plane along three different paths (i), (ii), and (iii), as shown in Figure 14.3. Since 
the vector field is independent of z and the paths are all in the zy-plane, we ignore 
z completely. 

The first path is the straight line y = x. A convenient parameterization is x = at, 
y = at with 0 < t < 1. Along this path the components of A become 

A x = Kxy 2 = K(at)(at) 2 = Ka 3 t 3 , A y = Kx 2 y = K(at) 2 (at) — Ka 3 t 3 . 


Furthermore, taking the differentials of x and y, we obtain dx = adt and dy = adt. 
Thus, 


r r(a,a) r i 

/ A ■ dr = (A x d x + Aydy) = K / [(a 3 f 3 ) adt + (a 3 t 3 ) adi\ 

Jc -'(0,0) Jo 

7 ‘. 


= 2 Kcl / r dt = 
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Figure 14.3: The three paths joining the origin to the point (a, a). Path (iv) is to 
illustrate the importance of parameterization. 


Although parameterization is very useful, systematic, and highly recommended, 
it is not always necessary. We calculate the line integral along path (ii)—given by 
y = x 2 /a —without using parameterization. All we have to notice is that all the y’s 
are to be replaced by x 2 /a [and therefore, dy by (2 x/a)dx\. Thus, 

A x = Kxy 2 = Kx (^-) = K^, A v = Kx 2 y = Kx 2 (^-) = K^-. 

The line integral can now be evaluated easily: 



Finally, we calculate the line integral along the quarter of a circle. For this calcu¬ 
lation, we return to the parameterization technique, because it eases the integration. 
A simple parameterization is 

7T 

x = a — a cos t, y = a sin t, 0 < t < —, 
with dx = a sin tdt and dy = a cos tdt. This yields 

A x d x + Aydy = A'[(a — a cos t)a 2 sin 2 t]a sin tdt + A'[(a — a cos t) 2 a sin t]a cos t dt 
= A'a 4 [(l — cost)(l — cos 2 t) + (1 — cost) 2 cost] sin tdt 
= A'a 4 (l — 3 cos 2 1 + 2 cos 3 t) sin t dt. 


This is now integrated to give the line integral: 


r(a,a) 

/ ( 


(A x d x + Aydy) = Ka 


L 


r/2 


(1 — 3 cos 2 t + 2 cos 3 1) sin t dt 


= Ka 


tt/2 tt/2 tt/2 

— COS t + COS t — 4 COS t 
0 0 0 


Ka 4 
2 ‘ 


The fact that the three line integrals yield the same result may seem surprising. 
However, as we shall see shortly, it is a property shared by a special group of vector 
fields of which A is a member. ■ 




14.2 Curl of a Vector Field and Stokes’ Theorem 


Many a time parameterization makes life a lot easier! Suppose we want 
to calculate the line integral of a vector field along path (iv) of Figure 14.3. 
First let us attempt to calculate the line integral using the coordinates. Along 
path (iv) dr = —e x dx\ so A ■ dr = —A x dx. Then 

r(0,a) nO pa 

/ A ■ dr = — / A x dx= A x dx. 

J {a,a) Ja J 0 

Thus, if A x > 0 (try A x = x 2 ), the integral will be positive. But this is wrong: 
A positive A x should yield a negative A ■ dr because the two vectors are in 
opposite directions! 

With parameterization, this problem is alleviated. A parameterization 
that represents path (iv) is 

x = a( 1 — t), y = a, 0 < t < 1. 


Clearly, t = 0 corresponds to the beginning of path (iv) and t = 1 to its 
endpoint. The parameterization automatically gives dx = —adt and dy = 0. 
For instance, the vector field of Example 14.1.2 yields 

AO,a) i-l p i 

/ A dr = a(l — f)a 2 (— a dt) = — a 4 / (1 — t) dt = — \a A . 

J (a,a) JO Jo 

This has the correct sign because A x is positive and the direction of integration 
negative. The other method would have given a positive result! 


14.2 Curl of a Vector Field and Stokes’ 
Theorem 

Line integrals around a closed path are of special interest. For example, if 
the velocity vector of a fluid has a nonzero integral around a closed path, the 
fluid must be turning around that path and a whirlpool must reside inside 
the closed path. It is remarkable that such a mundanely concrete idea can be 
applied verbatim to much more abstract and sophisticated concepts such as 
electromagnetic fields with proven success and relevance. Thus, for a vector 
field, A, and a closed path, C, we denote the line integral as 

J A - dr 

where the circle on the integral sign indicates that the path is closed and C 
denotes the particular path taken. 

In our discussion of divergence and flux, we encountered Equation (13.11) 
where an integral (over volume V) was related to an integral over its boundary 
(surface S). This remarkable property has an analog in one lower dimension: 
Any closed curve bounds a surface inside it. Is it possible to connect the 
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Figure 14.4: There is no “the” surface having C as its boundary. Both Si and S 2 —as 
well as a multitude of others—are such surfaces. 


line integral over the closed curve to a surface integral over the surface? The 
answer is yes, but we have to be careful here. What do we mean by “the” sur¬ 
face? A given closed curve may bound many different surfaces, as Figure 14.4 
shows. It turns out that this freedom, which was absent in the divergence 
case, 1 is irrelevant and the relation holds for any surface whose boundary is 
the given curve. 

Let us now develop the analog of the divergence theorem for closed line 
integrals. To begin, we consider a small closed rectangular path with a unit 
normal e„, which is related to the direction of traversing the path by the 
Right-hand rule right-hand rule (RHR): 

(RHR) rules here! 


Box 14.2.1. (The Right-Hand Rule). Curl the fingers of your right 
hand in the direction of integration along the curve, your thumb should 
then point in the direction of e n . 


Without loss of generality we assume that the rectangle is parallel to the xy- 
plane with sides parallel to the z-axis and the y-axis and that e n is parallel 
to the z-axis (see Figure 14.5). The line integral can be written as 


j A ■ dr 


A-dr 


A-dr 


A-dr 


A-dr. 


We do the first integral in detail; the rest are similar. Along ab the element 
of displacement dr is always in the positive ^-direction and has magnitude dx , 

At should be clear that we cannot change the shape of the volume enclosed in S without 
changing S itself. This rigidity is due to the maximality of the dimension of the enclosed 
region: A volume is a three-dimensional object, and three is the maximum dimension we 
have. Theories with higher dimension than three will allow a deformability similar to the 
one discussed above. 
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Figure 14.5: A closed rectangular path parallel to the xj/-plane with center at (a:, y, z). 

so it can be written as dr = e x dx. Thus, the first integral on the RHS above 
becomes 

/ b pb pb pb 

A ■ dr = Ai • dri = / Ai • (e x dx) = / A\ x dx, 


where, as before, the subscript 1 indicates that we have to evaluate A at the 
midpoint of ab and the subscript x denotes the ^-component. Now, since ab 
is small and the angle between A and dr does not change appreciably on ab, 2 
we can approximate the integral with Ai x ab and write 

f b _ (A y \ 

/ A • dr « A lx ab = A lx Ax = A x I x,y - —,z J Ax 


coordinates of 
midpoint of ab 


4 , 4 AydA x ( 

A x (x, y, z) - — } Ax, 


2 dy 

where in the last line we used the Taylor expansion of A x . Similarly, we can 
write 

/ d pd pd pd 

A ■ dr = / Ao ■ dr 2 — / A 2 ■ (—e x dx) = - A 2x dx 

— f A V \ 

~ - A 2x cd = -A 2x Ax = -A x lx,y+ — ,z 1 Ax 


coordinates of 
midpoint of cd 


- i A x (x,y,z) + )> Ax. 


2 dy 

Adding the contributions from sides ab and cd yields 


A-d r 


r d F) A 

A ■ dr ss —Ax Ay. 


dy 


2 This condition is essential, because a rapidly changing angle implies a rapidly changing 
component A\ x which is not suitable for the approximation to follow. 
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curl of a vector 
field defined 


The contributions from the other two sides of the rectangle can also be 
calculated: 


pc pa 

/ A ■ dr + A ■ dr ss A-; iy Ay — A± y Ay 

Jb Jd 


Ax 


= Ay lx+ —, y,z) Ay-Ay[x- — ,y, z Ay 


Ax 


A y (x,y,z) + 


Ax dA y 
2 dx 


Ay - \ A v(x,y,z) - 


Ax dAy 
2 dx 


Ay 


dAy 

dx 


Ax Ay. 


The sum of these two equations gives the total contribution: 


l Adr *{lt-w) AxAv ' (143) 

Let us look at Equation (14.3) more closely. The expression in parentheses 
can be interpreted as the ^-component of the cross product of the gradient 
operator V with A. In fact, using the mnemonic determinant form of the 
vector product, we can write 



/e x 

e y 

e z \ 

det 

d 

d 

d 

dx 

dy 

dz 


\ ^4x 

Ay 

A Z J 

(dA z 

dAy\ 

G x -J- 

\ dy 

dz ) 





dAy _ dAA „ 

dx dy ) 


This cross product is called the curl of A and is an important quantity in 
vector analysis. We will look more closely at it later. At this point, however, 
we are interested only in its definition as applied in Equation (14.3). The 
RHS of that equation can be written as 


fdA dA \ 

( -gf- Ax Ay = (V x A), Ax Ay = (V x A) • e,Aa, 


where Ao = Ax Ay is the area of the rectangle. Noting that e z is in the 
direction normal to the area, we can replace it with e n . Therefore, we can 
write Equation (14.3) as 


J A ■ dr « (V x A) • e„A a = (V x A) • Aa. 


(14.4) 


Equation (14.4) states that for a small rectangular path C the closed line 
integral is equal to the normal component of the curl of A evaluated at the 
center of the rectangle times the area of the rectangle. This statement does 




14.2 Curl of a Vector Field and Stokes’ Theorem 


395 


not depend on the choice of coordinate system. In fact, any rectangle (or any 
closed planar loop) defines a plane and we are at liberty to designate that 
plane the a;y-plane. Thus, we can define the curl of a vector field this way: 

Definition 14.2.1. Given a small closed curve C, calculate the line integral 
of A around it and divide the result by the area enclosed by C. The component 
of the curl of A along the unit normal to the area is given by 


A ■ dr 

Curl A ■ e n = V x A ■ e„ = lim -. (14.5) 

Aa—*0 A a 

The direction of e„ is related to the sense of integration via the right-hand 
rule. 

In Equation (14.5) we are assuming that the area is flat. This is always 
possible by taking the curve small enough. Definition 14.2.1 is completely 
independent of the coordinate system and we shall use it to derive expressions 
for the curl of vector fields in spherical and cylindrical coordinates as well. 
The reader should be aware that the notation V x A is just that, a notation, 
and—except in Cartesian coordinates—should not be considered as a cross 
product. 

What happens with a large closed path? Figure 14.6 shows a closed path C 
with an arbitrary surface S, whose boundary is the given curve. We divide S 
into small rectangular areas and assign a direction to their contours dictated 
by the direction of integration around C . 3 If we sum all the contributions 
from the small rectangular paths, we will be left with the integration around 
C because the contributions from the common sides of adjacent rectangles 
cancel. 4 This is because the sense of integration along their common side is 



Figure 14.6: An arbitrary surface with the curve C as its boundary. The sum of the 
line integrals around the rectangular paths shown is equal to the line integral around C. 

3 The direction of the contour with one side on the curve C is determined by the direction 
of the integration of C. The direction of a distant contour is determined by working one’s 
way to it one (small) rectangle at a time. 

4 This situation is completely analogous to the calculation of the total flux in the deriva¬ 
tion of the divergence theorem. 


coordinate 
independent 
definition of curl 


from small 
rectangles to large 
loops 
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opposite for two adjacent rectangles (see Figure 14.6). Thus, the macroscopic 
version of Equation (14.4) is 


j A ■ dr 


JV JV 

£(V x A), • e ni Aa,i = J^(v x A), • Aa*, 

i= 1 i -1 


the most 
important Stokes' 
theorem 


where (V x A )* is the curl of A evaluated at the center of the ith rectangle, 
which has area A a, and normal e ni , and N is the number of rectangles on 
the surface S. If the areas become smaller and smaller as N gets larger and 
larger, we can replace the summation by an integral and obtain 

Theorem 14.2.1. ( Stokes’ Theorem ). The line integral of a vector field 
A around a closed path C is equal to the surface integral of the curl of A on 
any surface whose only edge is C. In mathematical symbols, we have 


j A - dr 


V x A ■ da. 


(14.6) 


The direction of the normal to the infinitesimal area da of the surface S is 
related to the direction of integration around C by the right-hand rule. 

Example 14.2.2. In this example we apply the concepts of closed line integral 
and the Stokes’ theorem to a concrete vector field. Consider the vector field 

A = K(x 2 ye x + xy 2 e v ) 


obtained from the vector field of Example 14.1.2 by switching the x- and y-components. 
We want to calculate the line integral around the two closed loops (the circle and 
the rectangle) of Figure 14.7 and verify the Stokes’ theorem. 

A convenient parameterization for the circle is 


x = a cost, y = asint, 0 < t < 2n, 
with dx = — asint dt and dy = a cos tdt. Thus, 

A ■ dr = K(a cos t) 2 (a sin t )(—a sin t dt) + K(a cos t)(asin f) 2 (acos t dt) = 0, 



Figure 14.7: Two loops around which the vector field of Example 14.2.2 is calculated. 
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and the LHS of the Stokes’ theorem is zero. For the RHS, we need the curl of the 
vector. 



e y 

e z 

d 

d 

d 

The 

Ty 

T)z 

x 2 y 

xy 2 

0 


K(y 2 — x 2 )e z . 


It is convenient to use cylindrical coordinates for integration over the area of the 
circle. Moreover, the right-hand rule determines the unit normal to the area of the 
circle to be e z . Thus, 



x A ■ da = K 



p 2 cos 2 ip)p dp dp = 0 


by the ip integration. Thus the two sides of the Stokes’ theorem agree. 

The two sides of the rectangular loop sitting on the axes will give zero because 
A = 0 there. The contribution of the side parallel to the y-axis can be obtained by 
noting that x = 2b and dx = 0, so that 

A ■ dr = A x dx + A y dy = 0 + 2bKy 2 dy 


and 


L 


(2b, b) 


(26,0) 


L 


A ■ dr = 2bK / y 2 dy = f I<b 4 . 


To avoid ambiguity, 5 we employ parameterization for the last line integral. A con¬ 
venient parametric equation would be 


x = 2b(l — t), y = b, 


which gives dx = —2bdt, dy = 0, and for which the line integral yields 
(•(26,0) 


r (26,0) yl rl 

/ A ■ dr = K / [2b(l — t)] 2 (b)(—2bdt) =—8b 4 K / (1 - t) 

J (26,6) JO Jo 


— tY dt = — §A'?> 4 . 


So, the line integral for the entire loop (the LHS of the Stokes’ theorem) is 


A ■ dr = | AT 4 - | AT 4 = -2Kb 4 . 


We have already calculated the curl of A. Thus, the RHS of the Stokes’ theorem 
becomes 


V x A • da = K 


= I< 


jj (y 2 - x 2 ) dx dy 
s 

f 2b [ b 2 f 2b 2 f b 4 

/ dx y"dy — A / x 2 dx / dy = —2Kb 4 

Jo Jo Jo Jo 


=26(63/3) 


(863/3)6 


and the two sides agree. 


5 See the discussion following Example 14.1.2. 
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George Gabriel 
Stokes 1819-1903 


conservative 
vector fields 
defined 


George Gabriel Stokes published papers on the motion of incompressible fluids in 
1842-43 and on the friction of fluids in motion, and on the equilibrium and motion 
of elastic solids in 1845. 

In 1849 Stokes was appointed Lucasian Professor of Mathematics at Cambridge, 
and in 1851 he was elected to the Royal Society and was secretary of the society 
from 1854 to 1884 when he was elected president. 

He investigated the wave theory of light, named and explained the phenomenon 
of fluorescence in 1852, and in 1854 theorized an explanation of the Fraunhofer lines 
in the solar spectrum. He suggested these were caused by atoms in the outer layers 
of the Sun absorbing certain wavelengths. However, when Kirchhoff later published 
this explanation, Stokes disclaimed any prior discovery. 

Stokes developed mathematical techniques for application to physical problems 
including the most important theorem which bears his name. He founded the science 
of geodesy, and greatly advanced the study of mathematical physics in England. His 
mathematical and physical papers were published in five volumes, the first three of 
which Stokes edited himself in 1880, 1883, and 1891. The last two were edited by 
Sir Joseph Larmor in 1887 and 1891. 


14.3 Conservative Vector Fields 

Of great importance are conservative vector fields, which are those vec¬ 
tor fields that have vanishing line integrals around every closed path. An 
immediate result of this property is that 


Box 14.3.1. The line integral of a conservative vector field, between two 
arbitrary points in space is independent of the path taken. 


To see this, take any two points P\ and P 2 connected by two different directed 
paths Ci and C 2 as shown in Figure 14.8(a). The combination of Ci and the 
negative of C 2 forms a closed loop [Figure 14.8(b)] for which we can write 

f A ■ dr + I A ■ dr = 0 

J Ci J — C2 

because A is conservative by assumption. The second integral is the negative 
of the integral along C 2 . Thus, the above equation is equivalent to 

f A - dr — j A ■ dr = 0 => / A ■ dr = I A ■ dr 

J Ci J C2 J Ci J C2 

which proves the above claim. 

Now take an arbitrary reference point Pq and connect it via arbitrary paths 
to all points in space. At each point P with Cartesian coordinates ( x,y,z ), 
define the function $(x,y,z) by 


<l)(a;, y,z) = — A ■ dr = — / A ■ dr, 
Jp 0 JC 


(14.7) 
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Figure 14.8: (a) Two paths from Pi to P 2 , and (b) the loop formed by them. 


where C is any path from Po to P and the minus sign is introduced for 
historical reasons only. 4> is a well-defined function because its value does not 
depend on C and is called the potential associated with the vector field A. 
We note that the potential at Po is zero. That is why P 0 is called the potential 
reference point. 

Now consider two arbitrary points Pi and P 2 , with Cartesian coordinates 
(xi,yi,Zi) and (X 2 , 1 / 2 , z 2 ), connected by some path C. We can also connect 
these two points by a path that goes from Pi to Po and then to P 2 (see 
Figure 14.9). Since A is conservative, we have 


pP"! pPo pPl 

/ A-dr = A • dr + / A • dr = $(xi, y u z x ) - $(x 2 , 2 / 2 , z 2 ) 

JPx Jp 1 JPo 


the function <E>, so 
defined, has the 
mathematical 
property expected 
of a function, 
namely, that for 
every point P, the 
function has only 
one value that we 
may denote as 

HP)- 


or 


rP2 

Hx 2 ,y 2 ,z 2 )= - A-dr, 

JPi 


(14.8) 


potential of a 
conservative 
vector field 


which expresses the potential difference between the two points. 

If Pi and P 2 are displaced infinitesimally by dr, then their infinitesimal 
potential difference will be 

d4> = -A ■ dr. 


On the other hand, 4>, being a scalar differentiable function of x , y , and z, has 
infinitesimal increment 


94 * 5 $ 

d$ = — dx + — dy + — dz = (V$) • dr, 
ox oy oz 



Figure 14.9: Any path C from Pi to P 2 is equivalent to the path Pi —> Pq —> P 2 . 
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the curl of a 
conservative 
vector field is zero. 


V x A = 0 does 
not necessarily 
imply that A is 
conservative! 


so we have 

—A ■ dr = (V4>) • dr. 

But this is true for an arbitrary dr. Taking dr to be e^ dx, e y dy , and e z dz 
in turn, we obtain the equality of the three components of and —A. 
Therefore, we have 

A = -V$, (14.9) 

which states that 

Theorem 14.3.1. A conservative vector field can be written as the negative 
gradient of a potential function defined as 

®(x,y,z) = - [ A-dr, 

JPo 

where (x,y,z) are the coordinates of P, and the integral is taken along any 
path connecting Pq and P. 

Another property of a conservative vector field can be obtained by rewrit¬ 
ing Equation (14.4), which is true for an arbitrary infinitesimal closed path: 


j A ■ dr k. (V x A) • e„A a. 

However, the LHS is zero because A is conservative. Thus we have 


(V x A) • e n A a = 0. 

This is true for arbitrary Aa and e n . Therefore, we have the important 
conclusion that V x A = 0 for a conservative vector field. It is important to 
note that although f c A ■ dr is zero and C is small, we cannot deduce that 
A ■ dr = 0 and, therefore, A =0. (Why?) 

A conservative vector field demands the vanishing of the curl. But is 

V x A = 0 sufficient for A to be conservative? The answer, in general, is 
no! (See Example 14.3.3 below.) If the vector field is well defined and well 
behaved (smoothly varying, differentiable, etc.) in a region of space U, then 

V x A = 0 in U implies that § c A ■ dr = 0 for all closed curves C lying entirely 
in U. In modern mathematical jargon such a region is said to be contractible 
to zero, which means that any closed curve in U can be contracted to a point 
(or “zero” closed curve) without encountering any singular point of the vector 
field (where it is not defined or well behaved). We state this result as follows: 


Box 14.3.2. Let the region U in space be contractible to zero for the vector 
field A. Then for any closed curve C in U, the two relations V x A = 0 
and j> c A ■ dr = 0 are equivalent. 
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Example 14.3.2. The line integral of the vector field of Example 14.1.2 was 
independent of the three paths examined there. Could it be that the vector field is 
conservative? The vector field is clearly well behaved everywhere. Therefore, the 
vanishing of its curl proves that it is conservative. But 

&X Gy G Z 

VxA = A' ^ ^ = (°)®* + (°)®» + ( 2x V - 2 xy)e z = 0. 

xy 2 x 2 y 0 

So, A is indeed conservative. 

Next we find the potential of A at a point (xo,yo) in the xy- plane. 6 Let the 
reference point be the origin. Since it does not matter what path we take, we choose 
a straight line joining the origin and (xo,yo)- A convenient parametric equation is 

x = xot, y = yot, 0 < t < 1 , 

which gives dx = xo dt and dy = yodt. We now have 

r( x o,vo ) 

$(xo,yo) = - / A ■ dr 
d(o,o) 

=-K [ [(x 0 t)(yot) 2 (x 0 dt) + (x 0 t) 2 (yot)(yodt)} 

Jo 

= -2Kx 2 0 yl [ t 3 dt = - \Kxlyl. 

Jo 

We can now substitute (x,y) for (xo,yo) to obtain 

®(x,y) = ~\Kx 2 y 2 . 

The reader may verify that A = — V®. | 


It should be clear that VxA^O always implies that A is not conservative. 
However, V x A = 0 implies that A is conservative only if the region in 
question is contractible to zero. 

Example 14.3.3. Consider the vector field 

ky kx 

A — - ■ 7) ; -fGy^ 

x 2 + y 2 x 2 + y 2 

where k is a constant. Since the components of this vector are independent of z, the 
curl of the vector can have only a z-component: 


V x A 


Gx Gy G z 

d d d 

~dx TFy Uz 

A x Ay 0 


( dA y dA x \ , 

\ dx ~ dy ) Gz - 


We completely ignore the ^-coordinate because A has no component in that direction. 
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The reader may easily verify that 

dA y k 1 2x 2 dA x k , 2 y 2 

~fa ~~x 2 + y 2 +k (x 2 + y 2 ) 2 ’ ~d7 ~ x 2 +y 2 + (x^ + y 2 ) 2 ’ 

so that 

dA y _ dA x _ _ 2k 2(x 2 + y 2 ) _ 

dx dy x 2 +y 2 ' (x 2 + y 2 ) 2 

and V x A = 0. 

Now take a circle of radius a about the origin and calculate the line integral of 
A on this circle. For integration, use the parameterization 


x = a cos t, y = a sin t, 
with dx = —a sin tdt and dy = a cos tdt. Then 


0 < t < 2n, 


A • dr = A x dx + A y dy = 
and, therefore 


k(a sin t)(—a sin tdt) k(a cos t)(a cos t dt) 


(a cost) 2 + (asint) 2 (ocost) 2 + (asinf) 2 

i: 


— —k dt 


A ■ dr = —fc 


dt = —2nk. 


This is an example of a vector field whose curl vanishes but yields a nonzero 
result for a closed line integral. The reason is, of course, that the region inside the 
circle is not contractable to zero: At the origin the vector is infinite. ■ 


If the vector field is conservative, in principle we can determine its potential 
either by direct antidifferentiation or by integration. The following example 
illustrates the former procedure. 

Example 14.3.4. Consider the vector field 

A = (2 xy + 3 z 2 )e x + (x 2 + 4yz)e y + (2 y 2 + 6 xz)e z . 


The reader may check that V x A = 0. Thus, since A is well defined everywhere, 
it is conservative. To find its potential <E>, we note that 


C/Sfc' o 

— = -A x = -2xy - 3 z 
dx 


$ = -x 2 y - 3 z 2 x + g(y, z), 


where we have simply antidifferentiated A x with respect to x —assuming that y 
and z are merely constants—and added a “constant” of integration: As far as x 
differentiation is concerned, any function of y and z is a constant. Now differentiate 
<£> obtained this way with respect to y and set it equal to — A y : 


-Ay = -(x 2 + 4zy) = = ^(- x 2 y - 3 z 2 x + g{y,z)) = -x 2 + 


d_ 

dy' 


dg_ 

dy' 


This gives 

do 

— =-4 yz =>■ g(y,z) = -2y 2 z + h(z) 

Note that our second “constant” of integration has no ^-dependence because g(y,z) 
does not depend on x. Substituting this back in the expression for $, we obtain 


$ = — x 2 y — 3 z 2 x + g{y,z) = — x 2 y — 3 z 2 x — 2 y 2 z + h(z). 
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Finally, differentiating this with respect to z and setting it equal to — A z , we obtain 
— A z = —(2 y 2 + 6a :z) = = -^-(—x 2 y — 3 z 2 x — '2y 2 z + h(z)) = —6xz — 2 y 2 + 

(72 (72 (12 


This gives 

dh 

dz 

The final answer is therefore 


= 0 => h(z) = const. = C. 


$( 0 :, y, z) = —x 2 y — 3 z 2 x — 2 y 2 z + C. 


The arbitrary constant depends on the potential reference point, and is zero if we 
choose the origin as that point. It is easy to verify that — Vd 1 is indeed the vector 
field we started with. _ 


There are various vector identities which connect gradient, divergence, 
and curl. Most of these identities can be obtained by direct substitution. For 
example, by substituting the Cartesian components of A x B in the Cartesian 
expression for divergence, one can show that 


V-(AxB)=B-VxA — A-VxB. (14.10) 

Similarly, one can show that 

V-(/A) = A-V/ + /V-A, 

V x (/A) = /V x A + (V/) x A (14.11) 

A x (V x A) = 1V|A| 2 - (A ■ V)A 


We can use Equation (14.10) to derive an important vector integral relation 
akin to the divergence theorem. Let B be a constant vector. Then the second 
term on the RHS vanishes. Now apply the divergence theorem to the vector 
field A x B: 



x B • da. = 


S 



V • (A x B) dV. 


Using Equation (14.10), the RHS can be written as 


RHS = ,/// B v x A dV = B • 77 J v x A dV. 

v v 

Moreover, the use of the cyclic property of the mixed triple product (see 
Problem 1.15) will enable us to write the LHS as 


LHS = J J (da x A) • B = J J B • (da x A) = B • J J da x A. 

s s s 
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Equating the new versions of the two sides, we obtain 


B / / / V x A dV = B • J J da x A 

V s 


or 


B • | / J J V x A dV- J J daxA| =0. 

v s 

Since the last relation is true of arbitrary B, the vector inside the parentheses 


must be zero. This gives the result we are after: 

V x AdV = J J da x A. 

v s 


(14.12) 


14.4 Problems 

14.1. Evaluate the line integral of 

A ( x , y, z) x g x T y Gy z g z 

along the path given parametrically by 

x = at 2 , y = bt, z = csin (nt/2) 
from the origin to (a, 6, c). 

14.2. Evaluate the line integral of 

2 2 

a / \ . y . z ^ 

A(x, y, z) = XG X + —Gy -e, 

0 c 

along the path given parametrically by 

x = acos(7rf/2), y = 6sin(7rt/2), z = ct 
from (a,0,0) to (0,6,c). 

14.3. Evaluate the line integral of 

A (x,y) = xg x + '—Gy 

along the closed ellipse given parametrically by 

x = acost, y = 6 sin t. 

14.4. Show that V x (A x r) = 2A. 
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14.5. Let 


A(x, y) = A x (x, y)e x + A y (x, y)e v 
B (x,y) = B x (x,y)e x + B y (x,y)e y 

be vectors in two-dimensions. 

(a) Apply the divergence theorem to A using a volume V enclosed by a cylin¬ 
der whose bottom base is an arbitrary closed curve C in the xy-plane and 
whose top base is the same curve in a plane parallel to the xy-plane, and 
whose lateral side is parallel to the 2 -axis. Now conclude that 

- Aydx) = JJ K + ^f) dxdv 

where R is the region enclosed by C in the xy-plane. This is the divergence 
theorem in two dimensions. 

(b) Apply Stokes’ theorem to B with C as above and S the region R defined 
above. Show that 

+ Bydy) = Jj^^-^dxi, 

This is the Stokes’ theorem in two dimensions. 

(c) Show that in two dimensions the Stokes’ theorem and divergence theorem 
are the same. 


14.6. Evaluate the line integral of 

A(x, y) = (x 2 + 3 y) e x + (y 2 + 2x) e y 

from the origin to the point (1,2): 

(a) along the straight line joining the two points; and 

(b) along the parabola passing through the two points as well as the point 

(- 1 , 2 ). 

(c) Is A conservative? 

14.7. Is the vector field A (x,y) = xe x2 cos ye x — ^e x2 sin y e y conservative? 
If so, find its potential. 


14.8. A vector field is given by 


A 


b 2 L 


y 



■ xe v 


xy. 


e {x+z)/b 


where 4*0 and b are constants. 

(a) Determine whether or not A is conservative. 

(b) Find the potential of A if it is conservative. 


14.9. The components of a vector field are given by 
A x = Vok 3 yze k xy , A v = Vok 3 xze k xy + Voksinky, A z 


V 0 ke k2xv . 


(a) Determine whether A is conservative or not. 

(b) If it is conservative, find its potential. 
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14.10. The Cartesian components of a vector are given by 

A x = 2axe kz , A y = 2 aye kz , A z = ka{x 2 + y 2 )e kz , 

where a and k are constants. 

(a) Test whether A is conservative or not. 

(b) If A is conservative, find its potential. 

14.11. Prove Equations (14.10) and (14.11). 

14.12. Show that 

V(A B) = (B V)A + (A • V)B + B x (V x A) + A x (V x B) 

and that 

A x (V x B) = V(A ■ B) - (A • V)B 

14.13. Verify the vector identity 

V x (A x B) = (B • V)A - (A ■ V)B - B(V • A) + A(V • B) 

14.14. Verify that for constant A and B 

V[A • (B x r)] = A x B 




Chapter 15 

Applied Vector Analysis 


In the last three chapters, we introduced the operator V and used it to make 
vectors out of scalars (gradient), scalars out of vectors (divergence), and new 
vector out of old vectors (curl). It is obvious that all these processes can 
be combined to form new scalars and vectors. For instance one can create 
a vector out of a scalar by the operation of gradient and use the resulting 
vector as an input for the operation of divergence. Since almost all equations 
of physics involve derivatives of at most second order, we shall confine our 
treatment to “double del operations” in this chapter. 

15.1 Double Del Operations 

We can make different combinations of the vector operator V with itself. By 
direct differentiation we can easily verify that 

V x (V/) = 0. (15.1) 

Equation (14.9) states that a conservative vector field is the gradient of its 
potential. Equation (15.1) says, on the other hand, that if a field is the 
gradient of a function then it is conservative. 1 We can combine these two 
statements into one by saying that 


Box 15.1.1. A vector field is conservative (i.e., its curl vanishes) if and 
only if it can be written as the gradient of a scalar function, in which case 
the scalar function is the field's potential. 


Example 15.1.1. The electrostatic and gravitational fields, which we denote 
generically by A, are given by an equation of the form 




1 Assuming that the region in which the gradient of the function is defined is contractable 
to zero, i.e., the region has no point at which the gradient is infinite. 
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vector potential 
defined 


Furthermore, the reader may show that (see Problem 12.17) 


Substitution in the above integral then yields 


A(r) = -Kj dQ( r')V 


= -V K 


= -V$(r), 

where <f>, the potential of A, is given by 

$(r) ee K 


Equation (15.3), in conjunction with Equation (15.1), automatically implies that 
both the electrostatic and gravitational fields are conservative. ■ 

In a similar fashion, we can directly verify the following identity: 

V • (V x A) =0. (15.5) 

Example 15.1.2. Magnetic fields can also be written in terms of the so-called 
vector potentials. To End the expression for the vector potential, we substitute 
Equation (15.2) in the magnetic field integral: 

B -l 1 x{--tra)}- 

We want to take the V out of the integral. However, the cross product prevents a 
direct “pull out.” So, we need to get around this by manipulating the integrand. 
Using the second relation in Equation (14.11), we can write 


■ V x v —v x V 


= -v(r') x V 


We note that V x v = 0 because V differentiates with respect to ( x,y,z ) of which 
v(r') is independent. Substituting this last relation in the expression for B, we 
obtain 




= V x h 


dq{ r')v( r') 


= V x A, 


where we have taken V x out of the integral since it differentiates with respect to 
the parameters of integration and is assumed independent of (x, y, z). The vector 
potential A is defined by the last line, which we rewrite as 

a — i. If M r')v(r') 
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If the charges are confined to one dimension, so that we have a current loop, then 
dq( r')v(r') = 7 dr' and Equation (15.7) reduces to 


A = 




(15.8) 


An important consequence of Equations (15.6) and (15.5) is 

V B = 0. (15.9) 

Since the divergence of a vector field is related to the density of its source, we 
conclude that there are no magnetic charges. 

This statement is within the context of classical electromagnetic theory. Re¬ 
cently, with the advent of the unification of electromagnetic and weak nuclear inter¬ 
actions, there have been theoretical arguments for the existence of magnetic charges 
(or monopoles). However, although the theory predicts—very rare—occurrences 
of such monopoles, no experimental confirmation of their existence has been 
made. m 


Vanishing of 
divergence of 
magnetic field 
implies absence of 
magnetic charges. 


15.2 Magnetic Multipoles 

The similarity between the vector potential [Equation (15.8)] and the electro¬ 
static potential motivates the expansion of the former in terms of multipoles 
as was done in (10.33). We carry this expansion only up to the dipole term. 
Substituting Equation (10.32) in Equation (15.8), we obtain 

A = k rn I U- + ^ + ■■■) dr 1 = <f dr'+^dL <fe r ■ r'dr'. 

J \r r z ) r J r z J 

=o 

The reader can easily show that the first integral vanishes (Problem 15.5). 

To facilitate calculating the second integral, choose Cartesian coordinates 
and orient your axes so that e r is in the x-direction. Denote the integral by 
V. Then 

da, • r'dr 1 = df x'dr' = df x'(e x dx' + e y dy' + e~ dz'). 



We evaluate each component of V separately. 


V x = j> x'dx' = \ j) d (x' 2 ) = \x' 


end 


beginning 


= o 


because the beginning and end points of a loop coincide. 
Now consider the identity 


j)(x'dy' + y'dx') = j) d(x'y') = ( x'y ') 


end 


beginning 


= o 


(15.10) 
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magnetic dipole 
moment 


magnetic dipole 
moment of a 
circular current 
loop 


with an analogous identity involving x' and z!. For the y-component of V, 
we have 

V y = (f x'dy' = \ (f x'dy' + b x'dy' + \ (f y'dx' — \ (f y'dx' 


These add up to nothing! 


1 (/,'# + /»'*') + 1 (/*'<%' -jyU* 


=0 by Equation (15.10) 


i j)(x'dy' — y'dx 1 ) = | j)(r' x dr% = \ r' x dr' 


r x dr • e, = —p • e 2 , 


It follows that 

, _ km,I -r 2 k rn I 

v ~ ~ Vy ~ 2^ 

where we have defined the magnetic dipole moment p as 

p = ^ j) r' x dr'. 


A similar calculation will yield 

. _ kml _ kml 

Z ~ ~ Vz ~ 

Therefore, 


r x dr • e y = -— p • e y . 


(15.11) 


A — A. x g x Ayfiy T A. z g z — ^ (by b * o 2 e 2 p • Gy ) 


=/xx(e y xe z ) by bac cab rule 


Recalling that e y x e z = e x , and that by our choice of orientation of the axes 
e r = e^, we finally obtain 


A = 


x e r 

j.2 


fc m p x r 

j>3 


(15.12) 


There is a striking resemblance between the vector potential of a magnetic 
dipole [Equation (15.12)] and the scalar potential of an electric dipole [the 
second term in the last line of Equation (10.33)]: The scalar potential is 
given in terms of the scalar (dot) product of the electric dipole moment and 
the position vector, the vector potential is given in terms of the vector product 
of the magnetic dipole moment and the position vector. 


Example 15.2.1. Let us calculate the magnetic dipole moment of a circular 
current of radius a. Placing the circle in the rcy-plane with its center at the origin, 
we have 


It If la 2 t 2n 

p = 2 J r ' X dr ' = 9 j (°*V) x (° dip <v) = J dip'e z = I-Ka 2 e z . 

So, the magnitude of the magnetic dipole moment of a circular loop of current is 
the product of the current and the area of the loop. Its direction is related to the 
direction of the current by the right-hand rule. ® 
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15.3 Laplacian 


The divergence of the gradient is an important and frequently occurring op¬ 
erator called the Laplacian: 

v-(v/ )s vy= 0 + 0 + 0. (15.13) 

Laplacian occurs throughout physics, in situations ranging from the waves on 
a drum to the diffusion of matter in space, the propagation of electromagnetic 
waves, and even the most basic behavior of matter on a subatomic scale, as 
governed by the Schrodinger equation of quantum mechanics. 

We discuss one situation in which the Laplacian occurs naturally. The 
result of the example above and Theorem 13.2.4 can be combined to obtain 
an important equation in electrostatics and gravity called the Poisson equa¬ 
tion: V • (— V4>) = 4irKpQ, or 

V 2< f>(r) = -47 tK Pq (t). (15.14) 

This is a partial differential equation whose solution determines the potential 
at various points in space. 2 In many situations the density in the region of 
interest is zero. Then the RHS vanishes and we obtain an important special 
case of the above equation called Laplace’s equation: 

V 2< f>(r) = 0. (15.15) 


Consider a fixed point P in space with Cartesian coordinates (ayi, 2/o> ~o) 
and position vector ro. Take another (variable) point with Cartesian coordi¬ 
nates ( x,y,z ) and position vector r. By direct differentiation, one can verify 


that 


V • 


r ~ ro 

l r — r oP 


= 0 


at all points of space except at r = ro for which the vector is not defined. 
Moreover, if S is any closed surface bounding a volume V, we have 



if P is in V, 
if P is not in V, 


by Theorem 12.1.2. On the other hand, the divergence theorem relates the 
LHS of this equation with the volume integral of divergence. Thus, 



if P is in V, 
if P is not in V. 


(15.16) 


2 The reader should consider this, and any other differential equation, as a local equation, 
meaning that the derivatives on the LHS and the quantities on the RHS are to be evaluated 
at the same point. 


Laplacian of a 
function 


Laplacian is found 
everywhere! 


Poisson equation 


Laplace's equation 
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relation between 
Laplacian and 
Dirac delta 
function 


This shows that V • [(r — ro)/|r — ro| 3 ] has the property that it is zero every¬ 
where except at P, but whose volume integral is not zero. This is reminiscent 
of the three-dimensional Dirac delta function. In fact, it follows from Equation 
(15.16) that 

V ' ( |r- rol 3 ) = 4 ^ (r “ ro) ’ (15 ' 17) 

Using Equation (15.2) and the definition of Laplacian, we also get 


v2 (y J -^) = - 4rf(r - r “ ) ' 

The last double-del operation we consider is 

V x (V x A) = V(V ■ A) - V 2 A 


(15.18) 


(15.19) 


which holds only in Cartesian coordinates and can be verified component by 
component. 

Example 15.3.1. Angular Momentum Operator In quantum mechanics, 
the angular momentum L = r x p becomes the differential operator L = — ifir x V, 
where h is the reduced Planck constant, which we set equal to 1 in the following 
discussion. The quantity L 2 = |L| 2 appears frequently in applications of quantum 
mechanics. It is therefore instructive to compute this quantity. 

Since L 2 is a differential operator, we let it act on some function / and carry 
out the differentiation until we get a simple result. Since 

L 2 = Ll + L 2 + Li, 


we let each component act on / separately. First note that 


L./—«r XV/). *-<(»§£-*|) 

L y f = -i (r x Vf) y = -i (*§£ - ) (15.20) 

L,f— n-v/),— <(*!-»!£) 


Therefore, 


r z f , 9 d 


= y 


.2d 2 f 


dz 2 


+ z 


2^1 

dy 2 


df df 

V dz *dy 


V dy dz 


2yz 


d 2 f 

dydz 


. Similarly, 


~Lyf = X 


2^1 

dz 2 


+ z- 


dx 2 dx 



— 2 xz 


d 2 f 

dxdz ’ 


-Kf = x 


2 dV 2 dV 

o 2 ' y a 2 
oy ox 


df df 0 

- Xt, - y a- 2xy 

ox oy 


d 2 f 

dxdy' 


and 
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Adding the three components and using a little algebra, we get 

r 2 f 2 ^ 2 , ( 2 d" f 2 9 2 f 2 d"f 

—L f = r V / — I a: —2 + y —2 + 2 j 
V Ox oy oz 

( d 2 f d 2 f d 2 f 

~ 2r ' ( v /) - 2 ( V z 7TdL + + xy- 


' dydz ~dxdz y dxdyJ ' 


(15.21) 


Let A denote the sum of the two expressions in the large parentheses. We can write 
A in a compact form by expanding (r • V)(r ■ V/): 

2 j. / d d d\ ( df df df 

(r-V) /E(r.V)(r.V/) = ^ + ^ + ^] + 

df d 2 f d 2 f d 2 f 

= x ——(- x 2 — 5 - + xy ———b xz 77 —-—bterms from y and z differentiation. 
ox dx oxoy oxoz 


comes from x differentiation 


Adding the terms from x, y, and 2 differentiations we obtain 

( r ■ V) 2 / = r ■ (V/) + A or A = (r ■ V) 2 / - r ■ (V/). 

Substituting this in (15.21) yields 

L 2 f = -r 2 V 2 / + r • (V/) + (r ■ V) 2 /. (15.22) 

As a differential operator, L 2 is written as 

L 2 = -r 2 V 2 + r ■ V + (r ■ V) 2 . (15.23) 

We shall come back to this discussion in Chapter 17 to show how index manipulation 
eases the calculation (see Example 17.3.3). ■ 


15.3.1 A Primer of Fluid Dynamics 

We have already talked about the flow of a fluid in Section 13.2.3, where 
we derived the continuity equation, which states the conservation of mass in 
mathematical terms. We now want to take up the dynamics of a fluid, i.e., 
the motion of various parts of the fluid due to the forces acting on them. 

Consider a volume V of the fluid bounded by a surface S. The pressure p 
exerted from outside at any point of S in the element of area da is normal to 
S at that point and pointing into the volume V. Thus, the element of force 
due to pressure is —pda. If pressure is the only source of force on the volume 
V of the fluid, then the total force on V is 



s 


Using Equation (13.12), we rewrite this as 


F = 



s 


pda 



v 


VpdV. 
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This shows that Vp is a force density, whose volume integral gives the force. 
If the density of the fluid is p and the mass element dm in V has velocity v, 
then the “mass time acceleration” is dmdv/dt = pdV (dv / dt ), and the total 
“mass time acceleration” is the volume integral of this quantity. If there are 
other forces acting on the fluid described by a force density f, we can add it 
to the right-hand side. Thus, Newton’s second law of motion gives 



p(dv/dt) dV 



VpdV + 



idV, 


and this holds for any volume V, in particular for an infinitesimal volume for 
which the integrals become the integrand. Hence, the second law of motion 
for the fluid is 


p(dv/dt) = — Vp + f. 
The total time derivative of velocity is 

dv dv dv dx dv dy dv dz 
dt dt + dx dt^~ dy dt dz dt 


dv 

~dt 


+ (v • V)v. 


(15.24) 


Euler’s equation of 
fluid dynamics 


Substituting this in (15.24) and dividing by p yields 


dv -Vp- 

_ + (v . V )v= — 


(15.25) 


This is Euler’s equation and is one of the fundamental equations of fluid 
dynamics. 

The force density f in Euler’s equation is usually that of the gravitational 
force. Since the gravitational force on an element pdV is g pdV, where g is 
the gravitational acceleration (or field), the gravitational force density is pg 
and (15.25) becomes 


— + (v-V)v=— -hg. (15.26) 

Example 15.3.2. In hydrostatic situations with a uniform gravitational field the 
fluid is not moving and Equation (15.26) becomes 


Vp = pg, 


and if g is in the negative 2 -direction, then 

dp _ dp _ dp _ 

di ~ dy “ °’ dz ~ ~ P9 ' 

Thus the pressure is independent of x and y, and depends only on height 2 . We 
assume that the fluid (really the liquid) is incompressible, meaning that its density 
does not depend on the pressure. Then, integrating the 2 equation gives 


p = -pgz + C. 


If the liquid has a free surface at z = h where the pressure is po, then C = po + pgh, 
and 


P = Po + pg{h - z). 
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Example 15.3.3. Stellar equilibrium A star is a large mass of fluid held 
together by gravitational attraction. If the star is in equilibrium, its fluid has no 
motion and (15.26) becomes 

Vp = pg or Vp = — pV4> 


where <f> is the gravitational potential. Dividing this equation by p, and taking the 
divergence of both sides, we obtain 


V ■ 



= -V 2 $ 


or 



A-kGp 


where we used the Poisson equation (15.14). For a spherically symmetric star, only 
the radial coordinate enters in the equation above, and borrowing from the next 
chapter the expressions (16.7) for gradient and (16.12) for divergence in spherical 
coordinates, the equation above takes the form 

1 d f r 2 dp\ 
r 2 dr ^ p dr) ~ nGp 


This is one of the fundamental equations of astrophysics. 


15.4 Maxwell’s Equations 

No treatment of vector analysis is complete without a discussion of Maxwell’s 
equations. Electromagnetism was both the producer and the consumer of 
vector analysis. It started with the accidental discovery by Orsted in 1820 
that an electric current produced a magnetic held. Subsequently, an intense 
search was undertaken by many physicists such as Ampere and Faraday to 
find a connection between electric and magnetic phenomena. By the mid- 
1800s, a fairly good theory of electromagnetism was attained which, in the 
contemporary language of vectors is translated in the following four equations: 


(1) 

// Eda = 

Q. 

i 

eo 

(2) 

J j B • da = 0; 


s 

dtfirn ' 
dt 


s 

( 3 ) 

1 

II 

( 4 ) 

j) B • dr = po I 


(15.27) 


The first integral, Gauss’s law (or Coulomb’s law in disguise), states that the 
electric flux through the closed surface S is essentially the total charge Q in 
the volume surrounded by S. The second integral says that the correspond¬ 
ing flux for a magnetic held is zero. The fact that this holds for an arbitrary 
surface implies that there are no magnetic charges. The third equation, Fara¬ 
day’s law, connects the electric held to the rate of change of magnetic hux 
(f> m . Finally, the last equation, Ampere’s law, states that the source of the 
magnetic held is the electric current I. The constant eo and no arise from a 
particular set of units used for charges and currents. 


equation for stellar 
equilibrium 


the four equations 
that Maxwell 
inherited in 
integral form 
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the four equations 
that Maxwell 
inherited in 
differential form 


Maxwell discovers 
the inconsistency 
of Equation 
(15.28) with the 
conservation of 
electric charge, 
and modifies the 
last equation to 
resolve the 
inconsistency. 


the four Maxwell 
equations 


15.4.1 Maxwell’s Contribution 


Equations (15.27) can be cast in differential form as well. The differential 
form of the equations is important because it places particular emphasis on 
the fields which are the primary objects. The differential form of the equations 
above are: 


(1) V • E = 
(3) V x E 


P_ 

eo 


<9B 

~dt' 


(2) V • B = 0; 

(4) V x B = p qJ. 


(15.28) 


We have already derived the first two equations in Theorem 13.2.4 and Equa¬ 
tion (15.9). Here we derive the third equation and leave the derivation of 
the last equation—which is very similar to that of the third —to the reader. 
Stokes’ theorem turns the LHS of the third equation of (15.27) into 


The RHS is 


LHS = 


V x E • da. 


S 


d(f)m 

dt 



■ da, 


where we have assumed that the change in the flux comes about solely due 
to a change in the magnetic field. This makes it possible to push the time 
differentiation inside the integral, upon which it becomes a partial derivative 
because B is a function of position as well. Since the last two equations hold 
for arbitrary S, the integrands must be equal. This proves the third equation 
in (15.28). 

Maxwell inherited the four equations in (15.28), and started pondering 
about them in the 1860s. He noticed that while the second and third are 
consistent with other aspects of electromagnetism, the other two equations 
lead to a contradiction. Let us retrace his argument. By Equation (15.5), 
the divergence of the LHS of the last equation of (15.28) vanishes. Therefore, 
taking the divergence of both sides, we get V • J = 0. This contradicts the 
differential form of the continuity equation (13.22) for charges which expresses 
the conservation of electric charge. Because of the firm establishment of the 
charge conservation, Maxwell decided to try altering the four equations to 
make them compatible with charge conservation. The clue is in the first 
equation. If we differentiate that equation with respect to time, we obtain 


— V • E = —— => V.W=i^ 

dt e 0 dt \dt J e 0 dt 



dp 

dt 


This suggested to Maxwell that, if the four equations are to be consistent 
with charge conservation, the fourth equation had to be modified to include 
eodE/dt. With this modification, the four equations in (15.28) become 
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(1) V • E = 
(3) V x E 


P_ 

eo 


<9B 

~dt ’ 


(2) V • B = 0; 

n-p 

(4) V x B = /xqJ + /ioeo^—. (15.29) 


It was a great moment in the history of physics and mathematics when 
Maxwell, prompted solely by the forces of logic and pure deduction, intro¬ 
duced the second term in the last equation. Such moments were rare prior 
to Maxwell, and with the exception of Copernicus’s introduction of the he¬ 
liocentric theory of the solar system and Descartes’s introduction of analytic 
geometry, deductive reasoning was the exception rather than the rule. The¬ 
ories and laws were empirical (or inductive); they were introduced to fit the 
data and summarize, more or less directly, the numerous observations made. 
Maxwell broke this tradition and set the stage for deductive reasoning which, 
after a great deal of struggle to abandon the inductive tradition, became the 
norm for modern physics. 

Today, we aptly call all four equations in (15.29) Maxwell’s equations, 
although his contribution to those equations was a “mere” introduction of 
the second term on the RHS of the last equation. However, no other “small” 
contribution has ever affected humankind so enormously. This very “small” 
contribution was responsible for Maxwell’s prediction of the electromagnetic 
waves which were subsequently produced in the laboratory in 1887—only eight 
years after Maxwell’s premature death—and put to technological use in 1901 
in the form of the first radio. Today, Maxwell’s equations are at the heart of 
every electronic device. Without them, our entire civilization, as we know it, 
would be nonexistent. 


mathematics and 
the force of logic 
and human 
reasoning unravel 
one of the greatest 
secrets of Nature! 


15.4.2 Electromagnetic Waves in Empty Space 


Let us look at some of the implications of Maxwell equations. Taking the curl 
of the third Maxwell’s equation and using (15.19) and the first and fourth 
equations of (15.29), we obtain for the LHS 


from Maxwell’s 
equations to wave 
equation 


LHS = V x (V x E) = V(V • E) - V 2 E = —Vp - V 2 E, 

eo 


and for the RHS 


RHS = - V x 


<9B 

~dt 


a._ d ( _ 5E 

^ (V<B)= -sr ,+ «-¥ 


In particular, in free space, where p = 0 = J, these equations give 


V E — P()6q ~^~2~ — 0- 


<9 2 E 

~dt 2 


(15.30) 
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electromagnetic 
waves propagate 
at the speed of 
light. 


gauge 

transformation 


This is a three-dimensional wave equation. 3 Recall that the inverse of the co¬ 
efficient of the second time derivative is the square of the speed of propagation 
of the wave. It follows that 


v = 


v/Moeo \J (47t x l(ffi 7 ) (8.854 x 10" 12 ) 


= 2.998 x 10 8 m/s, 


i.e., that the electric held propagates in empty space with the speed of light, 
c. The reader may check that the magnetic held also satishes the same wave 
equation, and that it too propagates with the same speed. In fact, it can be 
shown that the so-called plane wave solutions of Maxwell’s equations consist 
of an electric and a magnetic component which are coupled to one another 
and, therefore do not propagate independently (see Problem 15.9). 

Sometimes it is more convenient to work with potentials than the helds 
themselves. The vanishing of the divergence of magnetic helds suggests that 
B = V x A where A is the vector potential [see also Equation (15.6)]. The 
vector potential, as its scalar counterpart, has some degree of arbitrariness, 
because adding the gradient of an arbitrary function does not change its curl. 
This is an example of gauge transformation whereby a measurable physical 
quantity—the magnetic held, here—does not change when another (nonmea- 
surable) physical quantity is changed. Using this expression for B in the third 
Maxwell equation, we obtain 


df 


dA\ 


dA 


VxE=--(VxA) =x V x E + —— =0 =► E+ — =-V4>, 


at J 


dt 


where we switched the order of differentiation with respect to position and 
time, and used the fact that if the curl of a vector vanishes, that vector is the 
gradient of a function (Box 15.1.1). We therefore write 

r) yV 

E = - — -V$ and B = V x A. (15.31) 

Substituting these two expressions in the fourth Maxwell equation, we obtain 

1 r) / r) A \ 

Vx(VxA) =/ r 0 J + --(---W). 


Expanding the LHS using the double curl identity of Equation (15.19), and 
switching time and space partial derivatives yields 


V 


_ . 1 »\ 
+ c 2 dt ) 


V 2 A 


1 <9 2 A 
c 2 dt 2 


MoJ- 


Because of the gauge freedom, we can choose A and <I> to satisfy 

_ A 19$ n 

v ' A + ?aF ~°- 


(15.32) 


3 The reader may be familiar with the one-dimensional wave equation in which only one 
second partial derivative with respect to a single space coordinate appears. 
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This choice is called the Lorentz gauge, from which it follows that 

1 d 2 A 

V 2 A - =-„ 0 J. (15.33) 

c 2 at 

Similarly, by taking the divergence of the first equation in (15.31) and using 
the first Maxwell equation and the Lorentz gauge, we obtain 


V 2 $ 


1 <9 2 <J- 

c 2 dt 2 


P_ 

eo" 


(15.34) 


Equations (15.32), (15.33), and (15.34) are the fundamental equations of elec¬ 
tromagnetic theory. They not only give the solutions in empty space, where 
p = J = 0, but also when the sources are not zero, i.e., when the mechanism 
of wave production becomes of interest, as in radiation and antenna theory. 


James Clerk Maxwell attended Edinburgh Academy where he had the nickname 
“Dafty.” While still at school he had two papers published by the Royal Society of 
Edinburgh. Maxwell then went to Peterhouse, Cambridge, but moved to Trinity, 
where it was easier to obtain a fellowship. Maxwell graduated with a degree in 
mathematics from Trinity College in 1854. 

He held chairs at Marischal College in Aberdeen (1856) and married the daughter 
of the Principal. However in 1860 Marischal College and King’s College combined 
and Maxwell, as the junior of the department, had to seek another post. After failing 
to gain an appointment to a vacant chair at Edinburgh he was appointed to King’s 
College in London (1860) and became the first Cavendish Professor of Physics at 
Cambridge in 1871. 

Maxwell’s first major contribution to science was a study of the planet Sat¬ 
urn’s rings, and won him the Adams Prize at Cambridge. He showed that stability 
could be achieved only if the rings consisted of numerous small solid particles, an 
explanation now confirmed by the Voyager spacecraft. 

Maxwell next considered the kinetic theory of gases. By treating gases statis¬ 
tically in 1866 he formulated, independently of Ludwig Boltzmann, the Maxwell 
Boltzmann kinetic theory of gases. This theory showed that temperatures and heat 
involved only molecular movement. 

This theory meant a change from a concept of certainty, heat viewed as flowing 
from hot to cold, to one of statistics, molecules at high temperature have only a 
high probability of moving toward those at low temperature. Maxwell’s approach 
did not reject the earlier studies of thermodynamics but used a better theory of the 
basis to explain the observations and experiments. 

Maxwell’s most important achievement was his extension and mathematical for¬ 
mulation of Michael Faraday’s theories of electricity and magnetic lines of force. His 
paper On Faraday’s lines of force was read to the Cambridge Philosophical Society 
in two parts, 1855 and 1856. Maxwell showed that a few relatively simple math¬ 
ematical equations could express the behavior of electric and magnetic fields and 
their interrelation. 

The four partial differential equations, now known as Maxwell’s equations, 
first appeared in fully developed form in Treatise on Electricity and Magnetism 
(1873). They are one of the great achievements of nineteenth-century mathematical 


Lorentz gauge 



James Clerk 
Maxwell 1831-1879 
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physics. Solving these equations Maxwell predicted the existence of electromagnetic 
waves and the fact that these waves propagate at the speed of light (1862). He 
proposed that the phenomenon of light is therefore an electromagnetic phenomenon. 

Maxwell left King’s College, London, in the spring of 1865 and returned to 
his Scottish estate. He made periodic trips to Cambridge and, rather reluctantly, 
accepted an offer from Cambridge to be the first Cavendish Professor of Physics in 
1871. He designed the Cavendish laboratory and helped set it up. 


15.5 Problems 


15.1. Show that the curl of the gradient of a function is always zero. 

15.2. Show that the divergence of the curl of a vector is always zero. 

15.3. Verify Equation (15.19) component by component. 

15.4. Provide the details of Example 15.3.1: 

(a) Compute the three components of L and verify Equation (15.20). 

(b) Calculate L 2 f, L 2 f, L 2 } and show that you obtain the expressions given 
in the example. 

(c) Verify that L 2 f is as given in Equation (15.21). 

(d) Show that A = (r • V) 2 / — r (V/) and obtain (15.22). Here A is defined 
by the sum of the expressions in the two pairs of parentheses in Equation 
(15.21) 

15.5. By taking each component of dr 1 separately in a convenient coordinate 
system show that its integral round any closed loop vanishes. 


total magnetic 
force on a current 
loop in a constant 
magnetic field is 
zero. 


15.6. Recall that the total magnetic force on a current loop is given by 


F = I 


dr x B. 


Show that the total force on a current loop located in a homogeneous magnetic 
field is zero. 


15.7. Derive the differential form of Maxwell’s last equation from the corre¬ 
sponding integral form. 

15.8. Starting with Maxwell’s equations, show that the magnetic field satis¬ 
fies the same wave equation as the electric field. In particular, that it, too, 
propagates with the same speed. 

15.9. Consider E = E 0 e i ( aJ *- k r) and B = B 0 e i ( wt ~ k r \ where i = V^l, E 0 , 
Bo, k, and iv are constants. The E and the B so defined represent plane waves 
moving in the direction of the vector k. 

(a) Show that they satisfy Maxwell’s equations in free space if: 


(1) k • E 0 = 0; (2) k • B 0 = 0; 

(3) k x Eo = wBo; (4) k x Bo = —y Eo- 
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(b) In particular, show that k, the propagation direction, and E and B form 
a mutually perpendicular set of vectors. 

(c) By taking the cross product of k with an appropriate equation, show that 
jkj = uj/c. 

15.10. Derive Equation (15.34). 




Chapter 16 

Curvilinear Vector 
Analysis 


All the vector analytical quantities discussed in the previous chapters can 
also be calculated in other coordinate systems. The general procedure is to 
start with definitions of quantities in a coordinate-free way and substitute the 
known quantities in terms of the particular coordinates we are interested in 
and “read off” the vector analytic quantity. Instead of treating cylindrical and 
spherical coordinate systems separately, we lump them together and derive re¬ 
lations that hold not only in the three familiar coordinate systems, but also in 
all coordinate systems whose unit vectors form a set of right-handed mutually 
perpendicular vectors. Since the geometric definitions of all vector-analytic 
quantities involve elements of length, we start with the length elements. 

16.1 Elements of Length 

Consider curvilinear coordinates 1 ( 51 ,( 72 , 93 ) in which the primary line 
elements are given by 

dli = hi(qi, 92 , 93 ) dqi, dl 2 = h 2 (qi,q 2 ,q 3 ) dq 2 , dl 3 = h 3 (qi,q 2 ,q 3 ) dq 3 , 

where hi, h 2 , and h 3 are some functions of coordinates. By examining the 
primary line elements in Cartesian, spherical, and cylindrical coordinates, we 
can come up with Table 16.1. 

Denoting the unit vectors in curvilinear coordinate systems by ei, e 2 , and 
e 3 , we can combine all the equations for the elements of length and write 
them as a single vector equation: 

dr = dl = eidli + e 2 dl 2 + e 3 dl 3 = thhidqi + e 2 h 2 dq 2 + e 3 h 3 dq 3 . (16.1) 

1 As will be seen shortly, Cartesian coordinates are also included in such curvilinear 
coordinates. The former have lines (and planes) as their primary lengths and surfaces, thus 
the word “linear” in the name of the latter. 


curvilinear 

coordinates 
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Curvilinear 

Cartesian 

Spherical 

Cylindrical 

Qi 

X 

r 

P 

92 

y 

e 

V 

93 

Z 

V 

Z 

hi 

l 

l 

l 

h 2 

l 

r 

p 

h 3 

l 

r sin 9 

l 


Table 16.1: The specifications of the three coordinate systems in terms of curvilinear 
coordinates. 


This equation is useful in its own right. For example, we can obtain the curvi¬ 
linear unit vectors as follows. Rewrite Equation (16.1) in terms of increments: 

Ar « eihiAqi + e 2 h 2 Aq 2 + e 3 h 3 Aq 3 . 


Keeping q 2 and q 3 constant (so that A q 2 = 0 = A q 3 ), divide both sides by 
A(/i to obtain 


Ar , 

In the limit, the LHS becomes a partial derivative and we get 


1 dr 
hi dqi' 


(16.2) 


The other two unit vectors can be obtained similarly. We thus have 


Box 16.1.1. 

given by 


The ith unit vector of a curvilinear coordinate system is 


e* = 


1 dr 
hi dqi ’ 


* = 1,2,3. 


(16.3) 


This is a useful formula for obtaining the Cartesian components of curvilinear 
unit vectors, when the Cartesian components of the position vector are given 
in terms of curvilinear coordinates. 

Example 16.1.1. As an illustration of the above procedure, we calculate the unit 
vectors in spherical coordinates. First we write 

r = xe x + ye v + ze z = e x r sin 6 cos ip + e y r sin 6 sin ip + e z r cos d. 

Now we differentiate with respect to r to get 

j, „ dr „ 

ei = e r = — = e^, sin 9 cos ip + e y sin 9 sin ip + e 2 cos 9. 
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Similarly, 

1 dr 

= eg = - — = e x cos 9 cos p + e v cos 9 sin p — e z sin 9. 
r o9 

* » 1 dr „ 

e 3 = = — : —- — = —e* sm p + e y cos ip, 

r sm 9 dp 

where we have used Table 16.1. These are the results we obtained in Chapter 1 from 
purely geometric arguments. g 

We are now in a position to find the gradient, divergence, and curl of 
a vector field in general curvilinear coordinates. Once these are obtained, 
finding their specific forms in cylindrical and spherical coordinates entails 
simply substituting the appropriate expressions for qi, <72 , and <73 and hi, h 2 , 
and h 3 . 


16.2 The Gradient 

The gradient is found by equating 

,, df df df 

df = dqi + -5— dq 2 + -5— dq 3 
oqi oq 2 dq 3 

to the differential of f in terms of the gradient: 

df = V/ • dr = {Vf)ih x dqi + ( Vf) 2 h 2 dq 2 + ( Vf) 3 h 3 dq 3 . 
The last two equations yield 

(V/)A = f, (V/)A = ^. (V/)a 


which gives 


Box 16.2.1. The gradient of a function f in a curvilinear coordinate 
system is given by 


V/ 


, 1 df , 1 df , 1 df 
1 hi dqi 2 h 2 dq 2 ' 3 h 3 dq 3 


(16.4) 


gradient in 
curvilinear 
coordinates 


This result, in conjunction with Table 16.1, agrees with the expression ob¬ 
tained for the gradient in the Cartesian coordinate system. In cylindrical 
coordinates, we obtain 
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gradient of a 
function in 
spherical 
coordinates 


so that the operator V in cylindrical coordinates is given by 


_ „ d „ 1 d ,9 

V — Gp — \~ G^p ~ h G z — . 

op p o<p OZ 


Similarly, in spherical coordinates, we get 


V/ = e 


df 

dr 


-e g - 


19/ 

d6 


with the operator V given by 


_ ,9 ,19 

V = er fr +ee rdd 


1 df 

r sin 9 dip 


_i_ d_ 

’ r sin 0 9</? 


(16.6) 


(16.7) 


(16.8) 


Example 16.2.1. The electrostatic potential of an electric dipole was given in 
Example 10.5.1 in spherical coordinates. With the expression for the gradient given 
above, we can find the electric field E = — V4> of a dipole in spherical coordinates: 


E r = — 
Eg = - 
E v = — 


94>di P _ d ( k e p cos9\ 
dr dr \ r 2 ) 

1 d^dip _ 1 d ( k e p cos 9 

r dd r d6 \ r 2 

1 dfodip _ 1 d / 

r sin 6 dp r sin 6 dp ' 


2 k e p cos 0 
pi ’ 

I k e p sin 9 

I = ’ 

k e p cos 9 \ _ 
r 2 ) ~ 


electric field of an 
electric dipole 


Summarizing, we have 


Edip = (2e r cos 9 + eg sin 9). 


This is the characteristic field of a dipole. 


(16.9) 


Example 16.2.2. Just as electric charges can produce electric dipoles, electric 
currents can produce magnetic dipoles. We saw this in Subsection 15.2. In this 
example, we will calculate the magnetic field of a dipole directly. Consider the 
magnetic field of a circular loop of current as given in Equations (4.24) and (4.26). 
We change the coordinates of the field point P to spherical and assume that P is far 
away from the loop, i.e., that a is very small compared to r. Writing r 2 for p 2 + z 2 
and rsin# for p, we expand the integrands of (4.24) and (4.26) in powers of a/r 
keeping only the first nonzero power. Thus, 


1 _ 1 
(r 2 +a 2 — 2ra sin# cost) 3 / 2 r 3 

1 

“ I 3 


+ (^) 2 - 2^) sin 9cost 
[l + 3 j sin# cos t| + • • • , 


-3/2 


rsin#cost — a 


(r 2 + a 2 — 2 ra sin 9 cos t) 3 / 2 


-77 ^sin # cos t — — ^ 1 + ( —) — 2 ^ — j sin#cost 

= (/in# cost — — j [l + 3 (-'j sin 9 cos tj + • • • 

1 ( - r\ ft 3a .2/i 2 \ 

= — sin 6 cos t -1-sm 6 cos t ) . 

r z \ r r 


- 3/2 
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Substituting these in the integrals of (4.24) and (4.26) yields 

kmlaz f 2n (_ 3 a . n \ , 3fc m /7ra 2 cosf?sin# 


B P = 


r 2?r (3a \ 

■ / cos £11 + -— sin 9 cos t) cLt ■ 

Jo V r J 


where we substituted rcos# for z. In an analogous way, we also obtain 


B z = - 


k m Ia r 2n 


n- 


sin 9 cos t — — + — sin 2 9 cos 2 1 1 dt 
r r 


k m Ia ( 2na 3an . o ~ , 

-1-sin 9 1 . 

r r 

We are interested in the spherical components of the magnetic field. To find 
these components, we first write 


B = B p e p + B z e z 

and take the dot product with appropriate unit vectors: 


B r = B ■ e r = B„e„ ■ e r + B z e z ■ e r = B„ sin 9 + B z cos 9 


3kmlna 2 cos 9 sin 9 . 


sin 9 + 


k m Ia { 2 -k a 3an . 2 n 

--sm 9 \ cos 9 

\ r r 


2k m Ina 2 


■ cos 9. 


Similarly, 


Bg = B ■ eg = B„e p ■ eg + B z e z ■ eg = B„ cos 9 — B z sin 9 


3km.I^CL 2 cos 9 sin 9 


n k m Ia (2na 3an . 2 . . 

cos 9 --—-sm 9 ) sm 9 

r 2 \ r r 


k m Ina" 


sin 9. 


Summarizing, we write 


B = 


km-Iita 2 


{2e r cos 9 + eg sin 9). 


(16.10) 


magnetic field of a 
magnetic dipole 


This has a striking resemblance to Equation (16.9). In fact once we identify Ina 2 
as the magnetic dipole of the loop, and change all magnetic labels to electric ones, 
we recover Equation (16.9). g 


16.3 The Divergence 

To find the divergence of a vector A, we consider the volume element of 
Figure 16.1 and find the outward flux through the sides of the volume. For 
the front face we have 

A <t>f = A f • e±Aa/, 
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Figure 16.1: Point P and the surrounding volume element in curvilinear coordinates. 
Note that the midpoints of the front and back faces are Aqi/2 away from P in the 
positive and negative ei directions, respectively. Similarly for the other four faces. 


where A f means the value of A at the center of the front face and Aa/ is the 
area of the front face. Following the arguments presented for the Cartesian 
case, we write 

A <J)f ~ A f ■ eiA af = A\f Al-2f 

— ^i/(^-2A(72)/(^3^(?3)/ = Aifh.2fh3fAq2Aq 3 

The subscript 1 in Aif, for example, means component of A in the direction 
of the first coordinate. The subscript / implies evaluation—at the midpoint— 
on the front side whose second and third coordinates are the same as P, and 
whose first coordinate is q\ + Aqi/2. Thus, we have 


Aqi 


Aqi 


A<t>f « A i 92,93J h 2 qi, ?3 

x h 3 (q\ + ^-,q2,qs) Aq 2 Aq 3 


because, unlike the Cartesian case, hi, / 12 , and h 3 are functions of the co¬ 
ordinates. Using Taylor series expansion for the functions Ai, / 12 , and h 3 
yields 


A (j)f « < A 1 {qi,q 2 ,q3) + 


Aqi dA\ 


x < h 3 {qi , 92 , 93 ) + 


2 dqi j 
Aq 1 dh 3 


2 dqi 


^ 2 ( 91 , 92 , 93 ) + 


>Aq 2 Aq 3 . 


Aqi dh 2 
2 dqi 
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Multiplying out and keeping terms up to the third order (corresponding to 
the order of a volume element by which we shall divide shortly), we obtain 

. f dh 3 dh'2 , , dAi 1 Aqy 

A^/ « < Ayh 2 h 3 + A\h 2 — -h Ayh 3 — -1- h 2 h 3 —— > —— Aq 2 Aq 3 

{ oq i ogi aqi J 2 

= (^2^3^l)|^-^ ( 72A(73, 

where we left out the explicit dependence of the functions on their independent 
coordinate variables. For the back face we have 


A0b k, A& ■ (—eiA cib) = —AibAl 2 bAl 3 b = — Ayb(h 2 Aq 2 )b(h 3 Aq 3 )b 

a ( Aqi \ h f Aqi 
= ~A\ q x - —,q 2 ,q 3 ) h 2 Ui-;t~, 52,93 


x h 3 ( q-i - ^-,q 2 ,q 3 \ Aq- 2 Aq 3 . 


Taylor expanding the three functions A±, h 2 , and h 3 as above, and multiplying 
out yields 


d 


A(pb « — < A\h 2 h 3 — —— (h 2 h 3 Ay) > —— Aq- 2 Aq 3 . 


dqi 




Adding the front and back contributions, we obtain 


A0i = A(j)f + A<j> b « — ( h 2 h 3 Ai) AqiAq 2 Aq 3 . 
oqi 

Similarly, the fluxes through the faces perpendicular to e 2 and e 3 are 


A(j>2 s 

d 

- -X— (hih 3 A 2 ) AqyAq 2 Aq 3 , 
dq 2 


A(j) 3 s 

d 

3 —- {hyh 2 A 3 ) AqyAq 2 Aq 3 . 
oq 3 

( 16 . 11 ) 


Adding the three contributions and dividing by the volume 

AV = Al\Al 2 Al 3 = h\h 2 h 3 AqiAq 2 Aq 3 


and finally taking the limit of smaller and smaller volumes—which turns all 
approximations into equalities—we get divergence in 

curvilinear 

Theorem 16.3.1. The divergence of a vector field A in a curvilinear coordi- coordinates 
nate system is given by 


V-A = 


1 


d 


h\h 2 h 3 ) dq-y 


(h 2 h 3 Ay) 


dq 2 


(hyh 3 A 2 ) 


dq 3 


(hyh 2 A 3 ) 
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divergence of a 
vector field in 
spherical 
coordinates 


Now that we have a general formula for the divergence, we can use Table 
16.1 to write the divergence in a specific coordinate system. For instance, 
substituting the entries of the second column gives the formula in Theorem 
13.2.1, and the third column yields 

v ' A = {s l- 2si ” M '> + 55 + 55 M-)} 

= (sinM.) + ^ J. (16.12) 


To obtain the divergence in cylindrical coordinates, we use the last column 
and obtain 


V-A = 


1 J d_ 

P\ dp 


( P A p ) 


d_ 

dp 

idA 


{A v ) 


dz 


(pA z ) 


Id 

~ ~p~d~p^ P p ^ + ~pdip 


dA z 

dz 


(16.13) 


Example 16.3.2. Consider the vector field defined by 

A = kr a e r , 

where k and a are constants. Let us verify the divergence theorem for a spherical 
surface of radius R (see Figure 16.2). The total flux is obtained by integrating over 
the surface of the sphere: 



A ■ da = 



kR a e r ■ e n R 2 sin 9 dd dp 


= kR 1 


ct +2 


//“ 


sin 9 d9 dp = 47r kR' 


ct+2 



Figure 16.2: The element of area and its unit normal for a sphere. 
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On the other hand, using the expression for divergence in the spherical coordinate 
system and noting that Ae = 0 = A v , we obtain 

v ■ A = ~ 2 ^:( r2Ar '> = ( fcr “ +2 ) = (« + 2) fcr “ _1 - 
where we have assumed that a ^ —2. Therefore, 

r R r- 7T f- 27T 

V-A dV = / (a + 2 )kr a ~ 1 r 2 dr / sinddd / dtp = 4nkR a+2 
Jo Jo Jo 

v 

which agrees with the surface integration. 

For a = — 2 the divergence appears to vanish everywhere. However, a closer 
examination reveals that the statement is true only if r ^ 0. In fact, as we discussed 
before, the divergence of A is proportional to the Dirac delta function, <5(r) in this 
case [see Equation (15.2)]. B 



16.4 The Curl 


To calculate the curl, we choose a closed path perpendicular to one of the unit 
vectors, say ei and calculate the line integral of A around it. The situation is 
depicted in Figure 16.3. We calculate the contribution to the line integral from 
path (1) in detail and leave calculation of contributions from the remaining 
three paths to the reader. In all calculations, terms of higher order than the 
second will be omitted 


'(i) 


A • dv k, Ai ■ Arj = A; • {—e 3 Al{) = —A 3 iAli = —A 3 ih 3 iAq 3 


= -A 3 ( qi, 92 - -^,<1 3 ) h ’i ( 9l, 92 - 

Aq 2 dA 3 
2 dq 2 

-A 3 h 3 Aq 3 + 7 ^- ( h 3 A 3 ) —^Aq 3 . 


Aq 2 


, 93 A<?3 


= — < A 3 — 


h 3 ~ 


Aq 2 dh 3 
2 dq 2 


A® 


e 3 



Figure 16.3: Path of integration for the first component of the curl of A in curvilinear 
coordinates. 




432 


Curvilinear Vector Analysis 


Following similar steps, the reader may check that 


f A ■ dr « A 3 h 3 Aq 3 + ( h 3 A 3 ) ^-A q 3l 

7( 2 ) oq 2 2 

/ A ■ dr ss A 2 h 2 Aq 2 - (h 2 A 2 ) ^-Aq 2 , 

7 ( 3 ) dq 3 2 

f d A 03 

/ A ■ dr « —A 2 h 2 Aq 2 - — (ft 2 A 2 ) ——Aq 2 . 
7(4) dg 3 2 


Summing up all these contributions, we obtain 
A • dr ; 


(h 3 A 3 ) — (h 2 A 2 ) '[Aq 2 Aq 3 . 


Dividing this by the area enclosed by the path 

Aa = Al 2 Al 3 = h 2 h 3 Aq 2 Aq 3 


(16.14) 


we obtain the first component, the component along the unit normal to the 
area: 

< v xA >‘-iro{£ <*•*>-4; (M 4 

Corresponding expressions for the other two components of the curl can 
be found by proceeding as above. We can put all of the components together 
curl in curvilinear in a mnemonic determinant form: 
coordinates 

Theorem 16.4.1. The curl of a vector field A in a curvilinear coordinate 
system is given by 



eifti 

e 2 ft 2 

e 3 ft 3 

1 

d 

d 

d 

h\h 2 h 3 

Uqf 

~dq2 



h\A\ 

h 2 A 2 

h 3 A 3 


(16.15) 


warning! V x A is Note that V x A is not a cross product (except in Cartesian coordinates), 

not a cross but a vector defined by the determinant on the RHS of (16.15). 

product in general If we substitute the appropriate values for ft’s and q's in spherical coordi- 

curvilinear nates, we obtain 

coordinates! 


G-p 

e g r 

e^rsinl 

d 

d 

d 

Ur 

uu 

TTp 

A r 

rAg 

r sin 6 A, 


V x A = 


(16.16) 
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In cylindrical coordinates we get 


V x A = 



e p 

e pP 


1 

d 

d 

d 

p 

TT P 


!Jz 


A P 

pAip 

A z 


(16.17) 


Example 16.4.2. We have already calculated the magnetic field of a dipole in 
Example 16.2.2. Here we want to obtain the same result using the vector potential 
of a dipole given in Equation (15.12). We take p to be along the z-axis. Then 


and 
Therefore, 

B=VxA=Vx 


(i = pe z = p(e r cos 9 — eg sin 6) 
p x e r = p(— siii#eg x e r ) = psin 9e v . 


krn P X e r 


= V x 


kmpsuide^ 



G r 

e e r 

e v r sin 9 


Gp 

e g r 

e v r sin 8 

kmfJ 1 

d 

Ur 

d 

UB 

d 

Up 

kmfJ' 

d 

Ur 

d 

UB 

d 

Up 

r 2 sin 8 

r 2 sin 8 


0 

0 

. sin 9 


0 

0 

sin 2 8 


r 2 


r 


t'm p 


2 sin 9 cos 9\ # ( sin 9 

e r ( - ] - reg - 


r 2 sin 9 

which is the expression obtained in Example 16.2.2. 


_ k m p cog + sin 9eg), 


Example 16.4.3. Consider the vector field B described in cylindrical coordinates 
as 

u 

B = —e v , 

P 

where A: is a constant. The curl of B is easily found to be zero: 


V x B = 



e p 

e vP 

e z 

i 

o 

d 

d 

p 

U~P 

Up 

Uz 


0 

p(k/p) 

0 


= 0 . 


However, for any circle (of radius a, for example) centered at the origin and located 
in the xy- plane, we get 2 

® B ■ dr = f —e v ■ ( e v adip ) = 2nk ^ 0. 

Jc Jo a 


2 See also Example 14.3.3 which discusses this same vector field in Cartesian coordinates. 
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central force fields 
are conservative 


The reason for this result is that the circle is not contractible to zero: At the 
origin—which is inside the circle and at which p = 0—B is not defined. 

This vector field should look familiar. It is the magnetic field due to a long 
straight wire carrying a current along the z-axis. According to Ampere’s circuital 
law, the line integral of B along any closed curve encircling the wire, such as the 
above circle, gives, up to a multiplicative constant, the current in the wire, and this 
current is not zero. ■ 


Example 16.4.4. A vector field that can be written as 

F = f(r) r, 

where r is the displacement vector from the origin, is conservative. It is instructive 
to show this using both Cartesian and spherical coordinate systems. 

First, in Cartesian coordinates 


and the curl is 


V x F = 


F = xf(r)e x + yf(r)e y + zf(r)e z 




©CE 

e y 

e z 

d 

d 

d 

Ux 

Uy Uz 

xf 

yf 

zf 

+ «V 

9 , „ 
[Yz {xf) ~ 


-( 

dx K 




Concentrating on the ^-component first and using the chain rule, we have 
dy dy dr dy dy 


But 

Thus, 

Similarly, 


= ±j X 2 + V 2 + Z 2 = i 

dy + r 

§-y {zf)=VZf '- 


d_ 

dz 


( yf ) = yzf'- 


Therefore, the ^-component of V x F is zero. The y- and ^-components can also be 
shown to be zero, and we get V x F = 0. 

On the other hand, using spherical coordinates, we easily obtain 


V x F = 


r 2 sin 9 


e r 

egr 

e r r sm 

d 

d 

d 

Ur 

UQ 

Uip 

r/(r) 

0 

0 


= 0. 


Obviously, the use of spherical coordinates simplifies the calculation consi¬ 
derably. ■ 
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The preceding example shows that 


Box 16.4.1. Any well-behaved vector field, whose magnitude is only a 
function of radial distance, r, and whose direction is along r is conserva¬ 
tive. Such vector fields are generally known as central vector fields. 


16.4.1 The Laplacian 

Combining divergence and the gradient gives the Laplacian. Using Equation 
(16.4) in Theorem 16.3.1, we get 


Theorem 16.4.5. The Laplacian of a function f is the divergence of gradient 
of f and—in a curvilinear coordinate system—is given by 

2 , = 1 f d ( h 2 h 3 df \ _8_ ( hih 3 df \ _d_ ( hih 2 df \ \ 

hih 2 h 3 1 d(p \ hi dqi) dq 2 \ h 2 dq 2 ) dq 3 \ h 3 dq 3 J J 

For cylindrical coordinates the Laplacian is 


- IA ( A\ + LAL + AA 

p dp \ dp J p 2 dip 2 dz 2 


(16.18) 


and for spherical coordinates it is 


V 2 / 


LA (2df\ 

r 2 dr \ dr J 


1 

r 2 sin 9 


A 

d9 



1 d 2 f] 

sin 9 dp 2 J 


(16.19) 


Equations (16.7) and (16.19) allow us to write the angular momentum 
differential operator derived in Example 15.3.1 in spherical coordinates, which 
is the most common way of writing it. We note that 


A 

dr 




Li 

dr 2 ’ 


and 


and 




(r-V) 2 / = r 


d (df 


dr V dr 


= r 


dj. 

dr 


n 2&f 

2 ’ 


df 


Substituting these plus (16.19) in (15.22) yields 


1 


d 


Lf = -^re\m( sine de 


idf 


Al\_ 


sin 9 dp 2 J 


(16.20) 


Therefore, the angular momentum operator depends only on angles in spher¬ 
ical coordinates. 


Laplacian in 

curvilinear 

coordinates 
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16.5 Problems 

16.1. The divergence of a vector can be obtained in any coordinate system 
by brute force calculation. In this problem you are asked to find V ■ A in 
cylindrical coordinates. 

(a) Express A x in terms of cylindrical coordinates and components. Hint: 
Write A in cylindrical ccordinates and take the dot product with e x expressing 
everything in terms of cylindrical ccordinates. 

(b) Use the chain rule 

dA x dA x dp dA x dp dA x dz 
dx dp dx dp dx dz dx 

where A x is what you found in (a). 

(c) Do the same with A y and A z , and add the three terms to obtain the 
divergence in cylindrical coordinates. 

16.2. Find the divergence of a vector in spherical coordinates following the 
procedure outlined in Problem 16.1. 

16.3. Find the gradient of a function in cylindrical and spherical coordinates 
following a procedure similar to the one outlined in Problem 16.1. 

16.4. Find the curl of a vector in cylindrical and spherical coordinates fol¬ 
lowing a procedure similar to the one outlined in Problem 16.1. 

16.5. Start with the Laplacian in Cartesian coordinates. 

(a) By using the chain rule and expressing the second derivatives in cylindrical 
coordinates, find the Laplacian in cylindrical coordinates. 

(b) Do the same for spherical coordinates. 

16.6. The elliptic cylindrical coordinates ( u, 9, z) are given by 

x = a cosh u cos 9 
y = a sinh u sin 9 

z = z 


where a is a constant. 

(a) What is the expression for the gradient of a function / in elliptic cylindri¬ 
cal coordinates? 

(b) What is the expression for the divergence of a vector A in elliptic cylin¬ 
drical coordinates? 

(c) What is the expression for the curl of a vector A in elliptic cylindrical 
coordinates? 

(d) What is the expression for the Laplacian of a function / in elliptic cylin¬ 
drical coordinates? 
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16.7. The prolate spheroidal coordinates (u, 9, <p) are given by 

x = a sinh u sin 6 cos <p 
y = a sinh u sin 9 sin ip 
z = a cosh u cos 9 


where a is a constant. 

(a) What is the expression for the gradient of a function / in prolate spheroidal 
coordinates? 

(b) What is the expression for the divergence of a vector A in prolate spheroidal 
coordinates? 

(c) What is the expression for the curl of a vector A in prolate spheroidal 
coordinates? 

(d) What is the expression for the Laplacian of a function / in prolate 
spheroidal coordinates? 

16.8. The toroidal coordinates ( u,9,<p ) are given by 

a sinh u cos p 

x = - 

cosh u — cos 9 

a sinh u sin ip 

V = - 

cosh 9 — cos 9 

a sin u 

z =- 

cosh u — cos 9 

(a) What is the expression for the gradient of a function / in toroidal coordi¬ 
nates? 

(b) What is the expression for the divergence of a vector A in toroidal coor¬ 
dinates? 

(c) What is the expression for the curl of a vector A in toroidal coordinates? 

(d) What is the expression for the Laplacian of a function f in toroidal coor¬ 
dinates? 

16.9. The paraboloidal coordinates (u,v,(p) are given by 

x = 2 auv cos tp 
y = 2 auv sin p 
z = a(u 2 — v 2 ) 


where a is a constant. 

(a) What is the expression for the gradient of a function / in paraboloidal 
coordinates? 

(b) What is the expression for the divergence of a vector A in paraboloidal 
coordinates? 

(c) What is the expression for the curl of a vector A in paraboloidal coordi¬ 
nates? 

(d) What is the expression for the Laplacian of a function / in paraboloidal 
coordinates? 
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16.10. The three-dimensional bipolar coordinates ( u,9,<p ) are given by 


a sin 9 cos <p 

x =- 

cosh u — cos 9 

a sin 9 sin ip 

^ cosh u — cos 9 

a sinh u 

z = - 

cosh u — cos 9 

(a) What is the expression for the gradient of a function / in three-dimensional 
bipolar coordinates? 

(b) What is the expression for the divergence of a vector A in three-dimensional 
bipolar coordinates? 

(c) What is the expression for the curl of a vector A in three-dimensional 
bipolar coordinates? 

(d) What is the expression for the Laplacian of a function f in three-dimensional 
bipolar coordinates? 



Chapter 17 

Tensor Analysis 


Our study of vectors in this part of the book has been limited to their anal¬ 
ysis in specific coordinate systems, and although we touched on the general 
curvilinear coordinate system, our treatment aimed at orthogonal coordinates, 
and specifically at only three-dimensional spherical and cylindrical coordinate 
systems. Many situations in physics demand a three-fold generalization: non- 
orthogonal coordinate systems, higher-dimensional spaces, and objects, called 
tensors, whose components have more subscripts than one. This chapter is 
devoted to an analysis of tensors. 

17.1 Vectors and Indices 

Vector manipulations will be greatly simplified if equations are written in 
terms of a general component. How do we accomplish this? Start with a 
generic vector equation, which can be written as 

U = V, 

where U and V are, in general, vector expressions. Examples of such an 
equation are 

B = V x A, E = -V<f>, A = f f(r)e r dr. 

J a 

You can also write each of these vector equations as three equations involving 
components. Thus, the foregoing generic equation becomes 

U X = V X , Uy = Vy, U Z = V Z . 

It is very helpful to convert letter indices into number indices. Let x —» 1, 
y —* 2, and 2 —> 3, and write 1 

Ui = Vi, U 2 = V 2 , U 3 = V 3 . 

1 Nol e that the replacements here refer to indices not the Cartesian coordinates. The 
latter will have somewhat different symbols in the sequel. 
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free index defined 


indexed Cartesian 
coordinates 


components of 
gradient 


These equations are abbreviated as 

Ui = Vi, * = 1,2,3. (17.1) 

This is what we mean by an equation in terms of a general component: The 
index i refers to any one of the components of the vectors on either side of 
the equation. It is called a free index because it is free to take any one of the 
values between 1 and 3. An important property of a free index is that 


Box 17.1.1. A free index appears once and only once on both sides of 
a vector equation. 


One can use any symbol to represent a free index, although the most common 
symbols used are i,j,k,l,m, and n. Thus, Equation (17.1) can be written in 
any one of the following alternative ways: 

Uj = Vj, j = 1,2,3, 

Up = Vp, p= 1,2,3, 

Uv = V?, O = 1,2,3. 

Of special interest are the components of the position vector r. These are 
denoted by Xi rather than r,. Thus, the vector relation R = r — r' is written 
as 

Aj Xj Xj , j 1,2,3. 

An abbreviation used for derivatives with respect to Cartesian coordinates 
(which coincide with the components of the position vector) is given as follows. 
First d/dx is replaced by d/dxi, and the latter by the much shorter notation, 
d\. Similarly, d/dy becomes 82 , and d/dz becomes 83 . In particular, the 
general component of the gradient of a function / will be written as dk /, k = 
1,2,3. 

All operations on vectors can be translated into the language of indexed 
relations. For example, A + B = C is equivalent to Ak + Bk = Ck, k = 1,2,3, 
and A = aB becomes Ak = aBj., k = 1,2,3, etc. The two operations of 
vector multiplication are a little more involved and we treat them separately 
in the following. 

First let us consider the dot product. In terms of components, the dot 
product of A and B can be written as 

A ■ B = A X B X + AyBy + A Z B Z . 

Converting to number indices, we get 

3 

A • B = A\B\ + A 2 B 2 + A 3 B 3 = 'y ' AiB-i. 

i= 1 
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We now introduce a further simplification in notation due to Einstein, which 
gets rid of the clumsy summation sign: 


Box 17.1.2. ( Einstein Summation Convention). Whenever an in¬ 
dex is repeated, it is a dummy index and is summed from 1 to 3. 


Using this convention we write the dot product as 

A B = AiBi. (17.2) 

No summation sign is needed as long as we remember that the repeated index 
i is summed over. Since the repeated index is a dummy index, we can change 
it to any other symbol. Thus, 


A B = AkBk = AjBj = A n B n = AyBy = • • • . 


Example 17.1.1. In this example, we write some of the familiar vector relations 
in both vector form and component form: 


E = 
V ■ A 



A ■ da = 



V ■ A dV 


s v 

V 2 $ 

V ■ (/A) = A • V/ + /V ■ A 


Ek = — dfc'h, 

djAj, 



Ak do,k 


!!! a ‘ A - dv ■ 


<9 m S m 4>, 

di(fAi) = Aidif + fdiAi. 


The reader is urged to verify all these relations, remembering the Einstein summa¬ 
tion convention. ■ 


17.1.1 Transformation Properties of Vectors 

Section 6.2.1 discussed the transformation of vectors, i.e., the way the compo¬ 
nents of a vector change when they are expressed in term of a new basis. To 
initiate the transformations relevant to the present chapter, let us begin with 
the position vector r, which in one Cartesian coordinate system (with basis 
{ei,e 2 ,e 3 }) is represented by (or, x 2 , x 3 ), and in another by (x 1 , x 2 , x 3 ). Here 
we are beginning to introduce new notation and terminology : instead of “vec¬ 
tor space,” we use “Cartesian coordinate system,” and instead of subscripts, 
we use superscripts to label the coodinates. 

Since both (x 1 , x 2 , x 3 ), and (x 1 ,x 2 ,x 3 ) are components of the same posi¬ 
tion vector, they are related via Equation (6.29): 

x 1 = anx 1 + di 2£ 2 + ai3X , 

x 2 = 021 X 1 + a 2 2 X 2 + a 2 3 X 3 , (17.3) 

x 3 = 031 X 1 + a 32 x 2 + a 33 x 3 . 


Einstein 

summation 

convention 


dot product 


coordinates with 
superscripts 
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Kronecker delta in 
a sum 


In terms of a free index, we can rewrite this as 

x l = anx 1 + a,i 2 X 2 + a^x 3 , i = 1,2, 3, 

and using the summation notation 

3 

x l = dijX-i , * = 1,2,3. 

3 =1 

Finally, using the Einstein summation convention and always keeping in mind 
that the free index i takes the values 1, 2, or 3, we come up with the following 
very succinct replacement for 17.3 

x l = dijxi. (17.4) 

Equations (17.3) and (17.4) are identical despite the enormous brevity of the 
latter. 

As an application of the use of indices and summation convention, we 
conveniently express the rule of matrix multiplication, which we shall use 
frequently. Box 6.1.3 gives this rule. Let C = AB be the product of A and B. 
Then the rule in Box 6.1.3 can be written as 


Cij — dik^kj ■ ( 17 - 5 ) 

Notice that here we have two free indices i and j. The index k is being 
summed over on the right. 

Of particular importance are transformations that leave the dot product 
intact. We called these transformations orthogonal (see Section 6.1.3). These 
orthogonal transformations satisfy Equation (6.20), which could be written in 
terms of indices. Noting that the ij -th element of the unit matrix is S,j , the 
familiar Kronecker delta, which as the reader may recall, is defined as 


= 


1 if i = j, 
0 if * ^ j, 


(17.6) 


we rewrite (6.20) as 


( A ) jfc ( A )fcj = (!)y or a kiO-kj = Sij. 


(17.7) 


Now multiply both sides of (17.4) by and sum over i to get 

Q'ik'K — Q'ikQ'ij ^ — % 5 


—fikj 


where in the last step we used the most important property of the Kronecker 
delta: 




17.1 Vectors and Indices 


443 


Box 17.1.3. When an indexed quantity shares a common repeated index 
with the Kronecker delta (thus a sum over that index understood), the 
result is an expression in which both the sum and the Kronecker delta are 
removed and the repeated index of the indexed quantity is replaced by the 
other index of the Kronecker delta. 


Thus the inverse of Equation (17.4) is 

x J = aijX \ (17.8) 

Note the difference in the position of the dummy index between this equation 
and (17.4). 

Equations (17.4) and (17.8) give the transformation rules for the compo¬ 
nents of the position vector when one goes from one Cartesian coordinate 
system to another. It should be clear that the same transformation rules 
apply to the components of any vector, as long as one adheres to Cartesian 
coordinate systems. Thus if V) and V t represent the components of a vector 
V in two Cartesian coordinate systems, then 

V = aijVj and V) = a^V). (17-9) 


In fact, it is customary to define vectors in terms of their transformation 
properties: 


Box 17.1.4. A set of quantities Vi is said to be the components of a 
Cartesian vector V if, under the orthogonal transformation (17.)), the 
transformed quantities Vi and the original quantities are related by (17.9). 


vectors defined in 
terms of their 
transformation 
property 


Section 1.3 introduced the idea of expressing vectors in different coordinate 
systems, mainly Cartesian, cylindrical, and spherical. In all cases, care was 
taken to use orthogonal unit vectors. In fact, this has been the sole practice 
throughout the book so far, and for good reason: the dot product of two 
vectors—and hence length of a vector, defined as the square root of the dot 
product of the vector with itself—does not change when their components in 
one set of orthogonal unit vectors are written in terms of their components in 
another set of orthogonal unit vectors. This actually defines the orthogonal 
transformation of Section 6.1.3, and Equation (6.20) or (17.7) guarantees the 
invariance of the length of a vector. 

Orthogonal transformations are not always the most suitable. As an exam¬ 
ple, consider a curve in space parametrized in a Cartesian coordinate system 
by x l = fi(t), where fi(t), / 2 (f), and / 3 (t) are some smooth functions. The 
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tangent to this curve—a vector -has components x l = dx 1 /dt = /'(f). Now 
consider a new coordinate system, not necessarily Cartesian, given by 

x l = gi(x x , x 2 , x 3 ). (17.10) 

The curve can be written in terms of the new coordinates by substituting fi(t) 

tot Pfirn n r^ J * 

= gi(fl(t), f 2 (t), f 2 (t)) = hi(t), 

where the last identity defines the function hi(t). The components of the 
tangent to the curve in the new coordinate system are given by the chain 
rule: 

x 1 = hi(t) = d ^9i-^ + 929i^-+d 3 gi^ = d 1 g i x 1 +d 2 g i x 2 + d 3 g i ± i = d j g l x J . 


Recalling that d :) g t = dgi/dx J and that gi = x l , this is usually written as 


dx l 
dad ' 


(17.11) 


It is instructive to see what happens if x l is given by (17.4). In that case, 
we have 

dad d , dx k 


dad dxi 


) — CLik 


dx £ 

= $hj 


= 


(17.12) 


where we have used an obvious property of partial derivative which is so useful 
that it is worth boxing it: 


Box 17.1.5. If{yi, y 2l ... , y m } are independent variables, then dyi/dyj = 

Sij. 


Equation (17.12) shows that, when applied to Cartesian coordinate transfor¬ 
mations, (17.11) is consistent with the definition of a vector as given in Box 
17.1.4. 

What about the inverse of (17.11)? Equation (17.10) can be treated as 
three equations in the three unknowns {ar, a; 2 , a: 3 }. One can then solve these 
unknowns as functions of the independent variables {a; 1 ,a; 2 ,a: 3 }. Whether 
or not one can actually solve (17.10) for {x 1 } depends on the form of the 
functions {<7i, <72 , <?3 }- If these functions satisfy certain (mild) mathematical 
properties, then Equation (17.10) is said to be invertible and each ad can 
be written as a function of the independent variables {af}. We assume that 
(17.10) is indeed invertible. 

Treating ad as dependent and {a?} as independent variables, using the 
chain rule, and employing obvious notation, we can write 

dad <9ad dx 1 dad dx 2 dad dx 3 dad dx k dad 

dt dx 1 dt dx 2 dt dx 3 dt dx k dt dx kX 
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Is this consistent with Equation (17.11)? In other words, if we substitute x 3 
from this equation into the right-hand side of (17.11), do we get x l l Let’s 
try it! 


RHS of (17.11) 



dx dx 3 ^ t. dx l ^ i. ^ i. ^ ■ 
——--^rrX = t—tX = OifcX = a: 
<9ab dx k 


where in the third equality we used the chain rule (2.16), in the fourth equality 
we used Box 17.1.5 as applied to the independent variables x l , and in the last 
equality we used Box 17.1.3. Thus, Equation (17.13) is indeed consistent with 
(17.11). It is tempting to call objects which transform according to (17.11) 
components of a vector. But before jumping to conclusions, let’s look at 
another vector with which we are familiar. 


17.1.2 Covariant and Contravariant Vectors 


The gradient of a function was first defined in Section 12.3. It is a vector 
whose components are essentially derivatives of the function with respect to 
the coordinates. Because we are interested in the transformation properties of 
objects, we first have to clarify the notion of a function. A scalar function is 
a physical quantity, such as temperature, which takes on a single value at each 
point of space. Now, a point has an existence independent of any coordinate 
systems. Nevertheless, coordinates are useful for calculations. And if the point 
is described by (a; 1 ,a; 2 ,a; 3 ) in a coordinate system, and (f denotes the scalar 
function, then we write (^(x 1 , x 2 , x 3 ) for the value of the scalar function at 
that point. The same point is described by (a; ,ar,ar) in another coordinate 
system, and the value of the scalar function in terms of the new coordinates is 
(fix 1 , x 2 , x 3 ). It should be obvious that the form of the scalar function changes 
when one changes the coordinates. Thus the notation <f instead of </>. Clearly, 

(fix 1 ,x 2 ,x 3 ) = (fix 1 ,x 2 ,x 3 ). (17.14) 


Now differentiate both sides with respect to x l . The left side gives the ith 
component of the gradient of <f; and using the chain rule on the right side, we 
get 

d(f d(f dx 1 d(f dx 2 dcf dx 3 d(f dx 3 

dx i dx 1 dx i dx 2 dx * dx 3 dx * dx 3 dx * ’ 


We thus obtain 

d(f dx 3 d(f 

dx 1 dx 1 dx 3 ’ 


(17.15) 


which is a different transformation than (17.11). 

It appears that we have two kinds of vectors: those whose components 
transform according to (17.11) and those transforming according to (17.15). 
To further elucidate the discussion, let’s look at the dot product. Let A and 
B be vectors which transform according to (17.11): 


scalar function 
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covariant and 
contravariant 
vectors 


The dot product in the x coordinate system is AiBi (sum over repeated indices 
understood!). Write this in terms of the x coordinates: 


_ _ fir 1 3r l 3 t 1 3 
AiBi = —A 1 ?—rB k = — — AiB k . 
dx? J dx k dxi dx k 3 


The right-hand side does not reduce to a dot product. 

Now consider one vector U that transforms according to (17.11) and an¬ 
other V that transforms according to (17.15) 


3x i - Br k 

= ^ v * = m v *> 


and take the dot product of these two vectors: 


_ _ 8f l Br k 

= M V m V - 


3x k Bx l 
8x l dxj 

s. 

by 



■v 


the chain rule 


SkjUjVk = r,\]. 

s. y ^ 

by Box 17.1.3 

(17.16) 


This is the magic of a general coordinate transformation! Although the func¬ 
tions { 51 , 32 , 53 } of (17.10) are completely arbitrary (except for invertibility), 
they respect the dot product, as long as one vector transforms according to 
(17.11) and the other according to (17.15). 

So far we have been considering coordinates in a three-dimensional space. 
However, as this section’s discussion easily points out, nothing prevents us 
from generalizing to n-dimensions: the only change we have to make is that 
the sums (and the repeated indices that imply them) should go from 1 to n. 
For example, (17.10) becomes 


x l = gi(x 1 ,x 2 ,..., x n ), i = 1, 2,..., n. 


(17.17) 


And this generalization is not purely academic, because, as we saw in Chapter 
8, relativity demands a /ow-dimensional spacetime. Having this generaliza¬ 
tion in mind, we make the following definition of the two kinds of vector 
discussed above: 


Box 17.1.6. The quantities {A 1 , A 2 ,... A n } and {B\ 1 B 2 , ■ ■ ■ B n } are said 
to constitute the components of a contravariant and a covariant vector, 
respectively, if, under a coordinate transformation (17.17) they transform 
according to 

— ■ df 1 — dr? 

A 1 = ^—A ] and B t = f^—B,. (17.18) 


Note the placement of the indices on the two types of vector. Only when 
an “upper” index appears with a “lower” index in a sum is the result (the dot 
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product) independent of the coordinate system used. Now the question arises: 
If one needs an upper and a lower index in the sum to get a quantity that is 
invariant, how does one define the length of a contravariant vector (which has 
only an upper index) or a covariant vector (which has only a lower index)? 
For this, we need to wait until we have introduced tensors and, in particular, 
the metric tensor. 


17.2 From Vectors to Tensors 


We have already discussed one kind of multiplication of vectors, the dot prod¬ 
uct [see Equation (17.2)]. Now we consider the cross product as a prototype 
of objects that have more than one index. The cross product of two vec¬ 
tors involves different components of those vectors (as opposed to the same 
components involved in the inner product). In terms of the index labels intro¬ 
duced above, this means that the cross product carries two indices. In fact, 
consider two (covariant) vectors A t and Bj. The components of their cross 
product are of the form AjBj — AjBi. In another coordinate system related 
to the first by (17.10), the components are AiBj — AjBi. Using (17.18) in 
Box 17.1.6 for A and B, we get 


- - r)x k dr h r)x k 

AiBj = —A k %—B h = ——A k B h , 
dx l dxJ dx l dx 3 


and 


AjBi = —A k %—B h = ——A k B h = ——A h B k , 

OX J OX 1 OX 3 OX 1 oxl OX 1 


where in the last step we just changed the dummy indices [see Equation (9.4)]. 
Subtracting the last two equations, we get 


AiBj — AjBi — 


dx k dx h 
dx i dxA 


(■A k B h 


AhB k ). 


Thus, if we define C k h = A k B k — A k B k as the components of A x B, the last 
equation gives their transformation property: 


^ dx k dx h ^ 
ij ~ kh ' 


(17.19) 


cross product as a 

two-indexed 

quantity 


Cross products are special cases of a more general category of mathemat¬ 
ical objects called tensors which carry multiple indices. Some of the indices 
may be upper, some lower. The most general tensor carries multiple upper 
and multiple lower indices. 
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Box 17.2.1. A set ofn r+s quantities ' ^ is said to constitute the com¬ 
ponents of a tensor T of type (r, s) if under a coordinate transformation 
(17.17) they transform according to 


jh-ir = dx)f_ _ _ _ dx ir dx kl _ dx k * hl _ hr 
dx hl dx hr dxA dxi* fcl "' fca 


(17.20) 


{* 1 ... * 7 -} and {ji-.-js} are called the contravariant and covariant 
indices, respectively. The rank of the tensor is defined as r + s. 


Note that for every index on the left there is an identical index on the right, 
and that only an upper index and its lower partner are repeated on the right. 
Here we are using the obvious convention that in the partial derivatives of the 
form dx k /dx^ or dx k /dxh k is considered an upper index and j a lower one. 


Example 17.2.1. When we introduced multipoles in Chapter 10, we were able to 
write the potential of a source distribution as an infinite sum of moments of source of 
higher and higher order. Although Cartesian coordinates are extremely clumsy for 
higher moments, the third moment can be handled neatly in Cartesian coordinates 
once we use the machinery of indices developed in this section. 

Recall that the integrand of the third term in the expansion of potential is [see 
Equation (10.33)] 


Integrand = r 1 ' 


1 

, 3 

A \2 

r’ 2 

3 , 2 / r • r' \ 

— 

+ (Gr* 

■ e r /) 

— - 

+ T ( I 

2 

2 


2 

2 \ rr' J 


Writing the position vectors in terms of their Cartesian components and rearranging 
terms yields 


Integrand 


3 ( xx ' + yy' + zz ') 2 r' 2 
2 r 5 2 

^ {x 2 (3x' 2 - r' 2 ) + y 2 (3y' 2 - r' 2 ) + z 2 (3z' 2 - r' 2 ) 

+6xyx'y' + 6 xzx z! + 6 yzy'z'} . (17.21) 


We want to express (17.21) in terms of indices. First let us concentrate on the terms 
involving x 2 , y 2 , and z 2 . Since these diagonal terms involve x 2 = X 1 X 1 , etc., it is 
natural to define a two-indexed quantity, say V(j, such that 

x ( 6x —r ) = XiXiV 11 , 

2/o ^2 /2\ _ T 

y (3y -r ) = X 2 X 2 V 22 , 
z (3z - r ) = X 3 X 3 V 33 , 


with 


t // r% t ! /2 T rt nil /2 T rl nil /2 

I n = 3xia:i - r , V 22 = 3*2*2 - r , V 33 = 3x 3 x 3 - r . 


Next, we note that the off-diagonal terms such as 6xyx'y' can be written as 6xiXjx'iX ) 
(no summation!). It appears as if we can write all terms in the last line of Equation 
(17.21) as j=i x i x jV(j if we can define Vfj properly. The off-diagonal sum sug¬ 
gests defining Vfj as Vfj = 3*)*). The reader may wonder why we did not include 
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the factor of 6 in the definition. The reason is that when summed over indices, the 
symmetry of V(j under interchange of its indices automatically introduces a factor 
of 2. The problem with this definition is that when i = j, i.e., when evaluating the 
diagonal terms, the r' 2 term is absent. To remedy this, we change the definition to 

V^-=3. x'ix'j-r l2 5ij. (17.22) 

Then, the Kronecker delta contributes only to the diagonal terms as it should. The 
reader is urged to show that 

1 3 ,1 

Integrand = — ^ XiXjV-j = —XiXjV'j, (17.23) 

i,j = 1 


where in the last equality the summation convention is implied. 

Now we substitute this in Equation (10.33) and denote the third term as 4>3(r). 
This yields 


. , , K 

$ s(r) = ijXiX-i 


\ j' K dQ{ F; 


- K n 

— g XiXjCJij . 


(17.24) 


The last equation defines the components of the quadrupole moment: 


Qa = \ f K dQ(v) = i [ (3 xWj ~ r' 2 Sij) dQ( r')- (17.25) 

One can use (17.25) to calculate the quadrupole moment of any source distribution. 
The quadrupole moment of electric charge distributions plays a significant role in 
nuclear physics. B 


A scalar (function) is a tensor of type (0,0); a contravariant vector is a 
tensor of type (1,0); a covariant vector is a tensor of type (0,1). Similarly, the 
cross product, the transformation of whose components is given in (17.19), is 
a tensor of type (0,2). Of special interest is the zero tensor, which can be of 
any type. Box 17.2.1 shows clearly that 


Box 17.2.2. If a tensor has zero components in one coordinate system, 
it has zero components in all coordinate systems. 


We have also encountered another two-indexed quantity, the Kronecker 
delta. Is it a tensor? If so, what type? We may think- since we have chosen 
both of its indices to be covariant—that it is of type (0,2). However, that 
is not the case, for the following reason. Equation (17.6), which defines the 
Kronecker delta, must hold in all coordinate systems. If Kronecker delta were 
of type (0,2), then it would transform according to 

- dx k dx h dx k dx k 

dx l dxi kh dx l dxi ’ 

and the right-hand side does not satisfy Equation (17.6). For the same reason 
the Kronecker delta cannot be a tensor of type (2,0). What if we define it to 
be a tensor of type (1,1)? Then 

_ dx l dx h k dx l dx k dx l _ „ 

dx k dxl h Q x k dxi dxl ? 


quadrupole 
moment defined 


Kronecker delta 
reindexed! 
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This shows that the proper way of indexing the Kronecker delta is to give it 
one covariant and one contravariant index, i.e., to treat it as a tensor of type 
( 1 , 1 )- 

Example 17.2.2. Chapter 8 introduced the idea of a four-vector, which is a 
vector with four components labeled 0, 1, 2, 3, with 0 being the time component and 
the rest the space components. It is common to label 4-vectors by Greek indices. 
For example, x a represents the coordinates, u a = dx a /dr represents the 4-velocity, 
p a i= mu a represents the 4-momentum, etc. The matrix g can be naturally assumed 
to be a tensor g a / 3 , and the inner product of two 4-vectors a a and 6 13 can be written 
as ri a pa a b 13 , with the summation over 0, 1, 2, 3 of a repeated index (one up, one 
down) understood. Because we have used i. j, k, etc., for the space part, we shall 
stick to this and write, for example u a = (u°,u x ), and 

3 3 

a a b a = a a b a = a°bo + abi = a°bo + abi. 

i=0 i= 1 ■ 

The notation of the example above is very commonly used in relativity 
theory: 


Box 17.2.3. Greek indices, representing the four-dimensional spacetime, 
run from 0 to f, while Roman indices, representing the space part, run 
from 1 to 3. 


17.2.1 Algebraic Properties of Tensors 

In our treatment of vectors, we saw that there were some formal operations 
which they obeyed. For instance, we could multiply a vector by a number, 
we could add two vectors, and we could multiply two vectors to get a third 
vector. Tensors also have some important properties which we summarize in 
the following. 


Addition 

If T and S are tensors of type (r, s), then their sum U = T + S, defined 
componentwise as 

U h—3s ~ 1 3i—js ' a h—js ’ 

is also a tensor of type (r, s). To show this, one simply has to demonstrate 
that transform according to (17.20) in Box 17.2.1. 

Moreover, if we define V = oT componentwise as 


yn-ir 

3l---3s 




where a is a real number, then V is also a tensor of type (r, s). The combi¬ 
nation of these two operations makes the collection of tensors of type (r, s) a 
vector space. 
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Multiplication 

If T is a tensor of type (ri,si) and S is a tensor of type (r 2 ,S 2 ), then their 
tensor product U = T (g> S, defined componentwise as 


i\..-ir-y +r2 il • •‘iri ^ +1 • • -\-V2 

jl“-jai+S2 jl“‘jai jsi + l"‘jsi + S2 


(17.26) 


is a tensor of type (ri + r 2 ,Si + S 2 ). For example, if T is a tensor of type 
(2,1) with components Tjf and S is a tensor of type (0,2) with components 
Sim, then the components of their tensor product U are U klm = T k Si m , and 
they transform according to the following rule: 


jjij _ dxi dxj dxQ ^hp dxT dxS c 

klm k Im dxh Qxp g . k q g _, g - m rs 

dx 1 d& Q x q g x r g x s ^ g x i g x j g x q g x r g x s 

dx h dx p dx k dx l dx m q rs dx h dx p dx k dx l dx rn qra ’ 

which shows that U is a tensor of rank (2,3). 

Example 17.2.3. One can obtain a tensor of any type by multiplying contravari- 
ant and covariant vectors: take r contavariant vectors and s covariant vectors and 
multiply them to get a tensor of type (r, s). For example, if A is a contravariant 
vector with components A x and B a covariant vector with components Bj, then 
T lJ = A l A J is a tensor of type (2, 0), Sijk = BiBjBk is a tensor of type (0, 3), and 
= A z A :i Bk is a tensor of type (2,1). _ 


Contraction 

Given a tensor of type (r, s), take a covariant index and set it equal to a 
contravariant index, i.e., sum over those two indices. The process is called 
contraction and the end result is a tensor of type (r — 1, s — 1). For example, 
take the tensor of type (2,1) whose components are Tjf and set k = j. How 
do the components T‘ :l transform? 

-ij = dx i dx j dx q hv dx i dx q dx j hv dx* ha 

i dx h dx p dxi 1 g x h g x j dx p t q dx h q 

=5? 

This shows that transform as components of a contravariant vector [see 
Equation (17.18)], i.e., a tensor of type (1,0). 

Of special interest is a tensor of type (1,1). When you contract this 
tensor, you get a tensor of type (0,0), i.e., a scalar. For example, let A be 
a contravariant vector with components A 1 and B a covariant vector with 
components Bj. Then T? = A l Bj is a tensor of type (1,1). When you 
contract it, you get T- = A 1 B,, which is the dot product of the two vectors, 
i.e., a scalar [see Equation (17.16)]. 
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Symmetrization 

Some important tensors in physics have the property that when two of its 
indices are interchanged the tensor does not change or it changes sign. In the 
first case, we say that the tensor is symmetric, in the second case, antisym¬ 
metric. For example, if T is a tensor of type (2,0) and U of type (0, 2), and if 

T lj = T ji and U tj = -U jU 

then T is symmetric and U is antisymmetric. 

Given any tensor, one can always construct from it a tensor which is 
symmetric or antisymmetric in the interchange of any pair of its indices. In 
particular, if T is any tensor of type (2,0), then the tensors S and A with 
components 


S ij = ±(T ij ' + T ji ) and A ij = \(T ij - T ji ) 

are called the symmetric and antisymmetric parts of T, and 

T ij = \{T ij + T ji ) + i(T ij - T ji ) = S ij + A ij . (17.27) 

The symmetric part S is sometimes denoted by T^) and the antisymmetric 
part A” by . 


17.2.2 Numerical Tensors 

There are certain “constant” tensors which play important roles in tensor 
analysis. We have seen one such tensor already: the (l,l)-type Kronecker 
delta. In fact, all the so-called numerical tensors are built form this funda¬ 
mental tensor. The generalized Kronecker delta is defined as 


6 n " l r 

31—3r 



( S n 

*£ • 

•• 


si 2 

< 5? 2 • 

• • < 5! 2 

det 

3 1 

3 2 

3r 


V4 

4 • 



(17.28) 


The determinant of an r x r matrix is a sum of terms each consisting 
of the product of r matrix elements. In (17.28), each term is a product of 
r Kronecker deltas. Since the Kronecker delta is a (l,l)-type tensor, each 
term, thus the determinant, and thus the generalized Kronecker delta, is an 
(r, r)-type tensor. 

It is clear from (17.28) that the upper indices label the rows and the lower 
indices the columns of the matrix. Thus interchanging any two of the upper 
indices is equivalent to interchanging two rows of the matrix. This changes 
the sign of the determinant. Similarly for the interchange of two columns. 
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Box 17.2.4. The generalized Kronecker delta is a completely antisym¬ 
metric tensor in its upper and lower indices: interchanging any two of its 
upper indices or any two of its lower indices changes its sign. 


Example 17.2.4. In this example, we demonstrate a useful property of the gen¬ 
eralized Kronecker delta. We illustrate the property for r = 3 and n = 3, 2 but the 
result can easily be generalized. Expand the determinant of S^ p about the last row 
starting from the right: 


S lmp = det f ^ 

W 


Sin 

SL 

si 


!)-*>' (i t 


= -■*«:+«? c. 


u i P 


— Si det 


Now contract over the indices k and p to obtain 


St 51 


+ Si det 


s;il = stst-sisii+sfsz, = 3 sii-sz+siit = 2 C+C = s;i = sisi-sisj, 


where in the next to the last step we used the antisymmetry of the generalized Kro¬ 
necker delta. Note that because of the antisymmetry of the generalized Kronecker 
delta in both upper and lower indices, we can move both the upper and the lower 
last indices to the beginning: 

ci±“‘ir _ 1 

°3l"3r ~ °3r3l-"3r-l ' 

In particular, 

stiL = s^ k = sisL-5isi. u 

The procedure of the example above can be generalized to arbitrary r and 
n. Furthermore, one can contract over more than one pair of indices. The 
result is the following useful identity: 


~ (?r — r)l' 


(17.29) 


From the generalized Kronecker delta two other important numerical ten¬ 
sors are built. These are called the Levi-Civita symbols. They are defined Levi-Civita 
as follows: symbols 

and e*-* =*&::%. (17.30) 

Note that both Levi-Civita symbols are antisymmetric in all their indices and 
will thus vanish if any two of their indices are equal. Moreover, 


ei 2 -n = Sltfi = 1 and e 12 "'" = SUZ = 1, (17.31) 

so that we have 


{ +1 if i\ • ■ • i n is an even permutation of 1,2,... n, 
—1 if i\ ■ ■ ■ i n is an odd permutation of 1,2,... n, 
0 otherwise. 


2 RecalI that n is the dimension of the space. 


(17.32) 
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Now consider the quantity 


e h-jn ~ °ji-j„ 


which is clearly antisymmetric in all its upper as well as lower indices. This 
means that the only nonzero elements of A 1 - 1 '"V* are those obtained from 
A\ 2 But this is zero by (17.31) and the definition of AdWe have just 
shown the following important result 


e h -On - °ji—j n - 


(17.33) 


17.3 Metric Tensor 


Let {a/*} denote a set of Cartesian coordinates, and {ad} some other coordi¬ 
nates of which {a;"} are functions. We then have 

dx^ 

dx n = -g—j-dx 3 (sum over j implied as usual). 

The element of length (squared)—which is customarily denoted by ds 2 —in 
the Cartesian coordinate system is 

n 

ds 2 = (da/ 1 ) 2 + (da/ 2 ) 2 H-b (da/”) 2 = ^(da/ 1 ) 2 . 

i =1 

In terms of the other coordinates, this can be written as 


ds 2 


n n 

J2(dx' 1 ) 2 = dx H dx H 

i—l i—1 



The expression in parentheses on the last line, denoted by gjk{x), is a sym¬ 
metric tensor of type (0, 2), which as indicated, is a function of the {ad}: 


dx n dx" 

= < 1734 ) 

i —1 

That gjk{x) is symmetric should be obvious. To show that it is a tensor 
of type (0,2), let {ai fc } be some new set of coordinates of which {x n } are 
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functions. We assume that all functional dependences are invertible. This 
means that {x k } can be thought of as functions of {x n }, and through {x n }, 
as functions of { x J }. In terms of the x variables, 


n 

9jk{x) = Y 

i=l 


dx n dx n 
dxi dx k ' 


Using the chain rule, this can be written as 


n 

9jk(x ) = Y 

i—1 


dx' 1 dx p dx' 1 dx q 
dx p dxi dx q dx k 



dx n dx' 1 
dx p dx q 


=g pq (x) 


dx p dx q 
dxi dx k 


dx p dx q . , 

-Q^0^9 Pq i.x), 


which shows that g pq transforms as a (0,2)-type tensor. In terms of this 
tensor, ds 2 is written as 


ds 2 = ’Y,(dx n ) 2 = gjk{x)dx^ dx k . (17.35) 

i=l 


The matrix whose elements are g pq is invertible. In fact, consider 


h km (x) = y 


p -1 


dx k dx m 
dx ,p dx ,p ’ 


which the reader can show to be a tensor of type (2, 0). Then 


9 jk(x)h km (x) = ( Y 


vi=1 


dx' 1 dx n 
dxi dx k 



.p=i 


dx k dx m 
dx' p dx' p 


_ y, dx 11 dx ri dx k dx m _ dx m dx ri _ 

• ^ r)nr r P r)nr r P • ^ ^ ^ 


i,p= 1 


dx H dxJ 


dx ^ _ xi 

~~dUp—°v 


_ 9a; m 
9x9 


where on the second line use was made of the chain rule and Box 17.1.5. 
This equation shows that the matrix whose elements are h km (x) is inverse 
to the matrix whose elements are gjk{x). It is common to use the same 
symbol for the inverse as for the original tensor. Thus, instead of h km (x), we 
use g km (x). 

The (0,2)-tensor gjk(x) was defined in terms of the transformation rule 
between a Cartesian and a second coordinate system. It turns out that one 
can abstract the properties of gjk{x) and define the metric tensor: 
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Box 17.3.1. A metric tensor g with components gij is a symmetric 
type-(0,2) tensor whose matrix has an inverse g _1 with components g km . 
Every metric tensor defines a geometry in which the (square of the) 
element of length ds 2 is given by 

ds 2 = gij(x)dx l dx\ 

where {a;*} are some appropriate coordinates in that geometry. 


The word “geometry” in this Box is used rather loosely. A precise defini¬ 
tion of “geometry” is beyond the scope of this book. Nevertheless, we mention 
that the notion of geometry starts with the concept of a manifold, which is a 
“space” that locally looks like a Euclidean space. For example, the surface of 
a sphere is a two-dimensional manifold, because a very small area of a sphere 
looks like a two-dimensional Euclidean space, i.e., a flat plane. Mathemati¬ 
cians study manifolds that have no metric tensors defined on them. However, 
in physics, almost all manifolds have a metric, and this metric defines the 
geometry of that manifold. 

In our discussion of the inner product in Section 6.1.2, we also encountered 
the metric tensor, although we called it the metric matrix. There, we defined 
the notion of positive definiteness. In the context of the discussion here, this 
property becomes the cornerstone of a special kind of geometry: if ds 2 of 
Box 17.3.1 is always strictly greater than zero for nonzero dx l and dx° , then 
the manifold on which g, :) is defined is a called a Riemannian manifold. 
Relativity requires manifolds that are not Riemannian, i.e., for which ds 2 can 
be zero or negative. 

Geometry is an intrinsic property of a space, while gij(x) depends on the 
coordinates used. This is evident in Equation (17.35) where ds 2 is given in 
terms of Cartesian coordinates as well as the other general coordinates. De¬ 
spite this coordinate dependence, the metric tensor does define the geometry 
of a manifold. In fact, there are some quantities obtained from the metric 
which characterize the intrinsic geometry of the manifold. We shall return to 
this discussion later. 

Example 17.3.1. Let us find the metric tensor in spherical coordinates. Use 
spherical coordinate symbols as indices with r, 9, and p as first, second, and third 
coordinates, respectively. Recalling that x 11 = x, x' 2 = y, and x' 3 = z, with 

x = r sin 9 cos p, y = r sin 9 sin p, z = rcos9, 
and using Equation (17.34), we get 

g rr (r, 9, p) = = (sin 9 cos p) 2 + (sin 9 sin tp) 2 + (cost*) 2 = 1 

_ dx dx dy dy dz dz 
dr 89 ^ dr 89 ^ dr 89 

= (sin 9 cos p) (r cos 9 cos p) + (sin 9 sin p) ( r cos 9 sin p) + (cos 9) (—r sin 9) 
= r sin 9 cos 9 cos 2 p + r sin 9 cos 9 sin 2 p — r cos 9 sin 9 = 0. 


g r e(r,9,p) 
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Similarly, the reader can show that g rv> = 0, and in fact all the off-diagonal elements 
vanish. On the other hand, 


dx 


w.M,*0 = (S) + (£) + (£ 


dy 


d8 


dz 


de 


= (r cos 6 cos ip) 2 + (r cos 6 sin ip) 2 + (—r sin 6) 2 = r 2 


dx 


g vv {r, 8 , p) - -g- + -5- + -5- 


dip) \d<p 


dy 


dz 


dp 


= (—r sin 6 sin ip)~ + (r sin 9 cos ip) = r 2 sin 2 9. 


Therefore, 

ds 2 = (dr) 2 + r 2 {dd) 2 + r 2 sin 2 9{dip) 2 = dr 2 + r 2 dd 2 + r 2 sin 2 8dip 2 , 

which agrees with Equation (2.25). Note how the parentheses have been removed 
from around the differentials. This is a very common (albeit inaccurate) 
practice. g 


17.3.1 Index Raising and Lowering 

After Box 17.1.6, we mentioned that the length of a covariant or contravari- 
ant vector cannot be defined without a metric tensor. Now that we have a 
metric tensor, we define them. In fact, we can do better! We can define the 
dot product of any two vectors. If one vector is covariant and the other con- 
travariant, their dot product is the usual one: the sum of the product of their 
components as shown in (17.16). If both vectors A and B are contravariant, 
define the dot product as 

A ■B = g ij A i B j , (17.36) 

and if both vectors are covariant, define the dot product as 

A ■ B = g ij AiBj. (17.37) 

The reader can routinely show that A • B = A • B in both cases. 

Equations (17.36) and (17.37) have an interesting interpretation. Take the 
first equation and recall from Equation (17.26) that the product gijA k is a 
tensor of type (1,2). Contracting the indices i and k turns that into a tensor 
of type (0,1), i.e., a covariant vector, say C with components Cj. But now 
note that 

C A - CjA j = g.jA'Aj - A A. 

It is therefore natural to denote g l:j A 1 —which is equal to <j 3l A l because of the 
symmetry of the metric tensor—by Aj. Thus, the metric tensor g.j 3 provides 
us with a way of changing contravariant vectors to covariant vectors, i.e., 
lowering their indices. Similar arguments show that the inverse of the metric 
tensor gA can be used to raise indices; and these two processes are consistent, 
in the sense that if we lower the index of a contravariant vector with g 3 j and 
then raise the index of the resulting covariant vector with g 13 , we get the 
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original contravariant vector. Here is a proof! Let C k = g k3 Aj, where A ? is 
the covariant vector obtained from A 1 . Then, 

C k = g k 'A, = gVgijA* = </'■' g,, A' = hf.T = A k , 

and the original contravariant vector is restored. The process of raising and 
lowering of indices works for arbitrary tensors: 


Box 17.3.2. Any contravariant index i of a general tensor can be made 
into a covariant index j by multiplying the component that includes i by 
gij. Any covariant index i of a general tensor can be made into a con¬ 
travariant index j by multiplying the component that includes i by g 13 . 


In Cartesian coordinates the (Euclidean) metric tensor is just the Kronecker 
delta. Therefore 


A 3 = g lJ Ai = SjAi = Aj, in Cartesian coordinates with Euclidean metric, 

(17.38) 

and the distinction between covariant and contravariant vectors (and indices) 
disappears. 

In special relativity and in Cartesian coordinates, the metric tensor is g a g, 
whose matrix is given in Equation (8.8). This tensor has components 

Vo 0 = 1, Vn = V22 = V33 = -1, Va0 = 0 if a ± f 3 in special relativity. 

The inverse of 77 a p is itself: g a ^ = Vap- In raising and lowering of an index, 
the time component does not change, while the space components change sign 
(see Box 17.2.3 for the meaning of Greek and Roman indices in relativity): 

A a = v a0 A 0 => A 0 = A 0 ,A i = -At (17.39) 

components of 
cross product 


Example 17.3.2. The Levi-Civita symbols are conveniently used to express the 
components of the cross product of two vectors in Cartesian coordinate systems. 
Since there is no difference between covariant and contravariant indices in Cartesian 
coordinate system, we use only covariant indices. 


(A x B); = djkAjBk , i = 1,2,3, (17.40) 

where a sum over j and k is understood. As a practice in index manipulation, the 
reader is urged to verify the above relation. The order of the two vectors on both 
sides of the equation is important! 

Using Equation (17.40) and some properties of the Levi-Civita symbol, we can 
derive the bac cab rule: 


A x (B x C) = B(A ■ C) - C(A ■ B). 
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Start with a general component of the LHS and work through index manipulations 
until you reach the corresponding component of the RHS: 


{A X (B X C)} i — €ijkAj( B X C)fc — tijkA-jekmnBmCn 

— €kij ^kmnAj BmCn — (&im&jn ^i/nAjrn) Aj 

— &im&jnAjBjnCn SinfijmAj B m C n = AjBiCj — AjBjC 
= Bi(AjCj) - Ci(AjBj) = Bi(A ■ C) - Ci( A ■ B). 


On the second line we used (17.33) and the result obtained in Example 17.2.4. The 
last expression above is the ith component of the RHS of the bac cab rule. ■ 


Example 17.3.3. Example 15.3.1 calculated the angular momentum differential 
operator using Cartesian coordinates. To illustrate the power of indices and the 
ease with which they allow some complex manipulations, we redo the calculation of 
Example 15.3.1 using indices. 

We have — L 2 / = (r x V) ■ (r x V)/. Letting dj stand for the partial derivative 
with respect to Xj, using Einstein summation convention, and recalling that no 
raising or lowering of indices is necessary for Euclidean space, we write 

B f — (r X V)i(r X — (tijkXjdk*) i^ilmXldm) f — tijk^ilmXjdk ( Xldmf ) , 

where we used (17.40). Continuing, refer to (17.33) and write the above equation as 

-L 2 f = (SjiSkm - SjmSki) Xjdk ( Xldmf) = Xjdk ( Xjdkf ) - Xjdk ( Xkdjf ) 

= XjSkjdkf + XjXjdkdkf - XjSkkdjf - XjXkdkdjf (17.41) 

= Xjdjf + r 2 V 2 / - 3 Xjdjf - XjXkdkdjf = r 2 V 2 / - 2(r ■ V)/ - XjXkdkdjf, 

because dkXj = 5kj, XjXj = r 2 , Skk = 3, Xjdj = r ■ V, and dkdk = V 2 . The last 
term in (17.41) above can be found from the following relation: 

Xkdk(xjdjf) = XkSkjdjf + XkXjdkdjf = Xidjf + XkXjdkdjf, 


or 

XkXjdkdjf = (r ■ V) 2 / - (r ■ V)/. 

Substituting in (17.41) yields Equation (15.22). Compare this derivation with the 
laborious calculation of Example 15.3.1! ■ 

17.3.2 Tensors and Electrodynamics 

Relativity was a logical outcome of the electromagnetic theory. It should 
therefore come as no surprise if the equations of electromagnetism found their 
most natural form in the language of relativity and tensors associated with 
it. In the discussion that follows, it is convenient and common practice to set 
the speed of light equal to 1; then since c = 1/^eoMOi w e have 

c = l, — = fj, 0 . 
eo 

Consider the Lorentz force law 

f=g(E + vxB) or fj = q (£) + e ijk VjB k ) , (17.42) 
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field tensor 


where as in Example 17.3.2, we used covariant indices for all tensors in the 
second equation. Since this is the fundamental force of electromagnetism, we 
expect it to have a natural expression in relativity. 

As a starting point, we note that the magnetic part is of the form VjFij, 
where F % j = eij k B k is an antisymmetric tensor of rank two. The obvious 
generalization that might lead to a connection with relativity is to consider 
an expression of the form u^F a f 3 , where u@ is the velocity 4-vector and F a p is 
an antisymmetric tensor of rank two which reduces to Fij when both a and 
/3 are nonzero. Let us look at vP F a p when a is v. 

u^Fip = u°F i0 + u 3 Fij, 


where we used the convention of Example 17.2.2. Equation (8.21) now gives 
u° = 7, and u l = 71A Then the equation above gives 

vPFip = 7-F-q + 7 v 3 F i:j = 7 (F i0 + v 3 Fij ) = 7 (F i0 + Vjeij k B k ), 


where in the last step, we disregarded the difference between covariant and 
contravariant indices. Comparison with Equation (17.42) shows that it is 
natural to set Fio = £). The second rank antisymmetric tensor F a @ is called 

the electromagnetic field tensor. 

Maxwell’s equations (15.29) take a specially simple form when written in 
terms of the electromagnetic field tensor. The first equation can be written 
as 

d t F i0 = = - = Mo p. (17.43) 

ox 1 eo 

The obvious generalization of the left-hand side to relativity is dF a p/dx a . But 
there is something wrong with this! Both a’s are lower indices- recall that the 
superscript of a coordinate in the denominator leads to a su&script—and you 
cannot sum over them. In the Euclidean case, this causes no problem because 
by (17.38), there is no difference between lower and upper indices and we can 
simply raise one of the V s. In relativity, however, there is a difference. So, we 
have to introduce the (inverse) 77 tensor. The left-hand side now becomes 

av dFgp 
1 dx v ' 


Since (3 is a free index, we expect the right-hand side to have a free index as 
well. So, we write the generalization of Maxwell’s first equation as 


av dFgp 
dx u 


Mo V@, 


(17.44) 


with Vp to be determined. For (3 = 0, we get 

,dF M 


av 


or 


V 


dx v 


= Mo Vo, or 


dFj 0 
dx l 


Mo Vo, 


where we used the fact that F a p is antisymmetric, so all its “diagonal” com¬ 
ponents are zero. We also used the fact that 77 is diagonal with the space 
elements being — 1 . Comparing with (17.43), we see that Vq = —p. 
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Now let 0 = i in (17.44). Then 


, 0F a 

dx v 


= voVi, or rj 




dx 


dx v 


or 

dF 0l dFji TA dEi , 

-q-F ~ -g^T = ^Vi, or ~~gf + etjkdjBk = mo Vi. 

This is the ith component of the vector equation 

<9E 


dt 


V x B = MoV. 


Comparing this with the fourth Maxwell’s equation, we identify V as J. Thus, 
the first and fourth equations, the inhomogeneous Maxwell’s equations 
are combined into 


ctv 9Fg0 

1 dx " 


Mo Jfli 


(17.45) 


Maxwell’s 1st and 
4th equations and 
four-current 


where J 0 = (— p, J) is the 4-current. We leave it to the reader to verify that 


dF a/ s dF ua dF 0u _ 

dx v dx 13 dx a 


(17.46) 


Maxwell’s 2nd and 
3rd equations 


combines the second and third equations, the homogeneous Maxwell’s 
equations. 

Equation (17.46) is satisfied if F a0 = d a A 0 — d 0 A a for any 4-vector A a , 
as the reader can easily verify. For a = i and 0 = 0, this gives 

(9.A- 

F l0 = d t A 0 - d 0 A. l: or £) = diA 0 - d 0 Ai, or E = VT 0 - dt 


Comparing this with (15.31) identifies To with the negative of the scalar 
potential 4> and A with the vector potential. We can thus write 

F a0 = d a A 0 - dpAa, A a = (-$, A). (17.47) 

Now that we have solved the homogeneous Maxwell’s equations by in¬ 
troducing the 4-potential, we can insert the result in (17.45) to write the 
inhomogeneous Maxwell’s equations in terms of the 4-potential as well. We 
then have 

rf v d v ( d a A 0 - d 0 A a ) = 0o Jp, 
or 

r] au d u d a A 0 - d 0 (q au d^A a ) = q o J 0 . (17.48) 

The expression in parentheses—when set equal to zero—gives the Lorentz 
gauge condition [see Equation (15.32)]. The remaining part of the equation 
gives the wave equation for A and 4>. 
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components of 
affine connection 


Tensors represent many quantities, whose variation with coordinates (points 
in space) has physical significance. Therefore, the notion of a derivative of a 
tensor becomes important. Although we can always differentiate components 
of a tensor (they are just functions), the resulting derivative is not necessarily 
a tensor. To obtain a tensor, one needs to generalize the concept of the 
derivative, as we do in this section. 


17.4.1 Covariant Differential and Affine Connection 


Let us begin by noting that the differentials of coordinates form the com¬ 
ponents of a contravariant vector. In fact, when the new coordinates x 1 are 
written as functions of the old coordinates ad and one takes the differential 
of the new coordinates, one obtains 


f) T l 

dx 1 = ——dx\ (17.49) 

oxi 

which is precisely the way a contravariant vector transforms. In fact, this 
is the archetypal example of a contravariant vector, and can be a guide in 
helping the reader remember the rule of transformation of the contravariant 
components of a tensor. 

The differential of a scalar—a tensor of type (0, 0)—is again a scalar, 
because 

d(j> = ^-dx l , 
dx l 

and the first term is the components of a covariant vector [see Equation 
(17.15)], and the second term the components of a covariant vector (as shown 
above). 

Next take the differential of a contravariant vector A 1 . How does it trans¬ 
form? By taking the differential of the transformation rule 


one obtains 


— • dr 1 


— • dr 1 

dA l = ——dA J + d 
dx^ 



A = ^-dA 3 

oxi 


9 ^ dx k A i 
dx k dxi 


(17.50) 


(17.51) 


If the second term on the right were absent, dA J would transform as a con¬ 
travariant vector. It turns out that one can add something to dAi whose effect 
is to cancel the unwanted term. 

Consider quantities r 3 , which transform according to 

-j dx 3 dx h dx k d 2 ^j dx h g x k 

mp dx 1 dx m dx p hk dx h dx k dx m dx p 


(17.52) 
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Any set of three-indexed symbols r^ p which transform according to this equa¬ 
tion is said to constitute the components of an affine connection. An affine 
connection is not a tensor because of the second term on the right-hand side 
of (17.52). Since this term is the same for all affine connections, the difference 
between two affine connections is a tensor of type (1,2). If and A J mp are 
any two affine connections then 


r -v _ A i \ 

m P m P g x i g x m g xP V x hk lv hkj 


(17.53) 


showing that T l hk — A l hk transform as components of a tensor of type (1,2). 
In particular, if A l hk = T l kh , then the difference T l hk — T l kh is essentially the 
antisymmetric part of the affine connection T: 


rL = §( r L + r ' 


hk t kh) + \ ( r L - 


symmetric part antisymmetric part 


The antisymmetric part of an affine connection is called its torsion ten- torsion tensor 
sor. Clearly if it vanishes in one coordinate system then it vanishes in all 
coordinates (the zero tensor is zero in all coordinate systems). Thus, the 
torsion tensor of an affine connection is zero, if an only if the connection is 
symmetric. 

Lack of tensorial character of the affine connection is precisely what is 
needed to make dA?, as well as dAj a tensor: 


Box 17.4.1. For any affine connection T kl , the quantities DA 3 and DAj 
defined by 

DA j = dA j + T j kl A k dx l and DAj = dAj - T k 3l A k dx l 

are, respectively, the components of a contravariant and a covariant vec¬ 
tor. They are called the covariant or absolute differential of the vectors. 


We show that DA^ is a contravariant vector, leaving the proof of the second 
claim to the reader. In the bar coordinates, we have 


DA J = dA j + T kl A k dx l . 
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Using Equations (17.49), (17.50), (17.51), and (17.52), we obtain 


— • dx^ i 

DA° = dA k 

OX K 


d 2 x j 


dx k A l 


dx k dx l 
dx 3 dx q dx 


pp _ 


d 2 x 3 dx q dx r 


dx p dx k dx 1 qr dx q dx r dx k dx 1 


dx 3 


dA k 


d 2 x 3 


■dx k A l 


dx k dx k dx l 

dx 3 dx q dx k dx r dx 1 


dx p dx k dx m dx 1 dx s qr 


T p A m dx s - 


\dx m dx s 


d 2 x 3 dx q dx k dx r dx 1 


dx q dx r dx k dx m dx 1 dx s 


A m dx s 


=5« 




=<5« 




dx 3 

dx k 


dA k 


d 2 x 3 
dx k dx 1 


dx k A l 


f)x^ d ^ x^ 


The second term cancels the last term (remember that you can use any symbol 
for the dummy indices that are summed over). Therefore, 


— • dx^ i 

DA 3 = dA k 4 

OX K 

dx? / k 

= a? 


dri dx^ ? 

- ^T p A q dx r = dA k 
dx p qr dx k 

dri 

+ | b'*i= S i M ‘> 




which is the transformation rule of a contravariant vector. 

Absolute differential can be defined for any tensor. For a scalar </>, Dp = 
dp. In the case of other tensors, for each contravariant index an affine con¬ 
nection term with a positive sign, and for each covariant index an affine con¬ 
nection term with a negative sign is introduced. For example, the covariant 
differential of T^ 3 is a tensor of type (2,1) given by 


DTi j = dT^ + (r‘,7f + - Tl q T; 3 ) dx q . 


Covariant differential has all the properties of ordinary differential when ap¬ 
plied to tensors. For example, the covariant differential of the sum of two 
tensors of type (r, s) is a tensor of type (r, s), and D(oT) = affT for any 
constant a and any tensor T. Covariant differential also obeys the Leibniz 
rule: 

D(T <8 S) = DT <g> S + T <g> DS. (17.54) 


17.4.2 Covariant Derivative 

In the first equation of Box 17.4.1, write dA 3 in terms of partial derivatives. 
Then, the equation becomes 




17.4 Differentiation of Tensors 


465 


Since the left-hand side and dx l are contravariant vectors, we suspect that the 
expression in parentheses is a tensor of type (1,1). This can in fact be shown 
directly. It is called the covariant derivative of A> with respect to x l and 
denoted by A° l . Thus, 



— +T j A k 
dx‘ + kl • 


(17.55) 


This is the generalization of ordinary derivative to situations in which the 
affine connection is nonzero. Covariant derivative can similarly be defined 
for covariant vectors as well as arbitrary tensors. For example, the covariant 
derivative of T/ 3 is a tensor of type (2, 2) given by 


T ij = dll 

k ' q dx q 


I pi rppj . pj rpip 

' L pq ± k ' L pq^k 


pP 'pi? 
L kq^p 


Consider a curve in Euclidean space parametrized by t. Let A l (t) be the 
value of a vector field at a point on the curve. If dA l /dt = 0, then the vector is 
constant along the curve, and we say that the vector is parallel translated 
along the curve. When the affine connection is nonzero, we divide both 
sides of the first equation in Box 17.4.1 by dt (which on the left we denote by 
Dt for aesthetic reasons), and say that a contravariant vector field is parallel 
translated along a curve if 


DA j 

Dt 


= 0 


or 


— +T j A k —- 0 
dt +LklA dt 


(17.56) 


with a similar definition for a covariant vector field. Since A 3 depends on t only 
through the coordinates, we use the chain rule dAi /dt = {dAi/dx l )dx l /dt to 
rewrite the equation above as 


DA> 

Dt 


f^ + r A k 

kj 


dx l . a dx l .a, 

— = A 3 ,— = A 3 ,x l = 0. 
dt ’ l dt ’ l 


(17.57) 


A curve whose tangent vector is parallel translated along that curve is 
called a geodesic. The components of the vector tangent to a curve is 
dx l /dt = x l . If we substitute this in (17.56) we obtain the following sec¬ 
ond order differential equation called the geodesic equation: 


Dx j 

~Dt 


= 0 , 


2 xi j dx k dx 
dt2 " + kl ^ttHt = ’ 


or x j + T j kl x k x l = 0, (17.58) 


where each super dot represents a differentiation with respect to t. Solving 
this differential equation yields the parametric equation of a geodesic. 


covariant 

derivative 


parallel translation 
along a curve 


geodesic and 
geodesic equation 


17.4.3 Metric Connection 

The affine connection, which is defined by its transformation property of 
(17.52) is completely arbitrary. One can define covariant differentials and 
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covariant derivatives in terms of any set of quantities that transform accord¬ 
ing to Equation (17.52). With a metric tensor, however, one can define a 
unique symmetric (therefore, torsion-free) affine connection called metric 
connection given by 


where 


W — W —-n jr 
1 kl ~ 1 Ik ~ 2 


ikl 


&9mk &9ml 

dx l 

dx k 

1 ( 

&9mk 

' 2 1 

dx l 


= g jm r 


mkl ? 


(17.59) 


(17.60) 


with all lower indices, is easier to remember. Note that it is the first index 
of T m ki that is raised to give the components of the metric connection, and 
for this reason the metric connection is sometimes denoted by kl . The 
verification that (17.59) is indeed an affine connection—i.e., that it transforms 
according to (17.52)—is straightforward but tedious. 


Example 17.4.1. If all components of a metric tensor are constant in some coor¬ 
dinate system, then all the components of the metric connection vanish. Note that 
this is true only in that particular coordinate system. Changing coordinates changes 
the affine connection, and in general, the components of a metric connection will 
not be zero even if they are zero in some coordinate system. If we use Cartesian 
coordinates, then the Euclidean metric is just the Kronecker delta. Therefore, all 
components of the metric connection are zero. Similarly, the metric of special rel¬ 
ativity in Cartesian coordinates in rj a 0 , whose components are either 0 or 1 or — 1. 
Hence, all components of the metric connection of special relativity in Cartesian 
coordinates vanish. | 


The metric connection has some special properties which are of physical 
importance. The first property which could be easily verified is that 

9ii;k= 0 or - r p jk g ip - Y p ik g pj = 0. (17.61) 

The second property is that between any two points passes a single geodesic 
of the metric connection, and this geodesic extremizes the distance between 
the two points. If the geometry is Riemannian (i.e., if the metric is positive 
definite) then the geodesic gives the shortest distance. In relativity, where the 
metric is not Riemannian, the geodesics give the longest distance. 

Example 17.4.2. In this example, we find the geodesics of a sphere. The spherical 
angular coordinates 9 and tp can be used on the surface of a sphere of radius a. From 
the element of length ds 2 = a 2 dd 2 + a 2 sin 2 8dip 2 on this sphere, and using 8 and ip 
to label components, we deduce that 

gu = gee = a 2 , g 22 = g vv = a 2 sin 2 9, g\ 2 = ge<p = 521 = g^e = 0, 


and similarly, 


11 _ ee 

g =g 


1 22 _ l pu> 

2 ’ 9 — g 2 • 2 / 1 1 

a* a z sin 9 


12 _ 0<p 21 _ v e 

g = g = g =g 


= 0. 
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Substituting these in (17.59), we can calculate the components of the affine connec¬ 
tion. The nonzero components turn out to be 


r® v = - sin 0 cos 0, TJk = r£ s = cot 0. 

Using these in the geodesic equation (17.58), we obtain the following two differential 
equations: 


d 2 e . (dtp V n 

at 1 \ at ) 

d 2 p „ n dp d6 


(17.62) 


The second equation can be solved to give 


dip _ C 
dt sin 2 9 


dp 



(17.63) 


where C is a constant of integration. Substituting this in the first equation of (17.62) 
gives 


d 2 9 

dt 2 


C 2 cos 9 


= 0. (17.64) 

sm 9 

To find the geodesic, it is more convenient to express 9 as a function of p. This 
means changing the independent variable in Equation (17.64) from t to p. This is 
done formally by using the second equation of (17.63) to substitute for dt in (17.62). 
Thus, the first tem of (17.62) can be written as 


d_ / cffl\ _ Cd / Cd9 \ _ C 2 d / 1 d9\ 
dt \dt) sin 2 9dp \ sin 2 9dp ) sin 2 9 dp \ sin 2 9 dp J 

Substituting this in (17.64) yields 


d 


d9 


dp V, sin 2 9 dp 
Differentiating the first term, we get 


— cot 0 = 0. 


—2 


cos 9 ( d9 


sin' 3 9 \ dp 


1 d 2 9 

sin 2 9 dp 2 


— cot 0 = 0, 


which can be simplified to the following differential equation: 


sin 9^-^; — 2 cos 0 

dp 2 


— sin 2 0cos 0 = 0. 


(17.65) 


If we could solve this equation, we would find 0 as a function of p, and this 
should be the equation of a geodesic on a sphere. Instead, let us use our knowledge 
of the geodesics (curves giving the shortest distance) on a sphere, write it with 0 as 
a function of p and see if it satisfies (17.65). Our sphere is parametrized as 


X = a sin 0 cos v?, y = a sin 0 sin p, z = acos9. 

The great circles—curves of shortest distance—are the intersection of a plane passing 
through the origin and the sphere. Such a plane has an equation of the form Ax + 
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By + Cz = 0. The intersection with the sphere is obtained by substituting for x, y, 
and s from the above equations: 

Aa sin 9 cos p + Ba sin 6 sin p + Ca cos 9 = 0. 

Dividing by Ca sin# and redefining A to be — A/C and B to be —B/C, we get 

cot 9 = A cos ip + B sin ip, 

as the equation of geodesic on a sphere. It is straightforward to show that this 
equation indeed satisfies (17.65). ■ 


17.5 Riemann Curvature Tensor 

Consider a closed loop, such as a rectangle, on a flat surface. Start a vector 
at one point of the rectangle (the lower left corner) and carry it parallel to 
itself to the point diagonally opposite the initial point [Figure 17.1(a)]. In one 
case carry the vector to the right and then up. In the second case carry the 
vector up and then to the right. Compare the vector at the end of the two 
cases. They are equal. Do the same on a curved space such as the surface of a 
sphere. The two vectors at the end do not coincide [see Figure 17.1(b)]! The 
degree to which they are different is a measure of the curvature of the space. 

Let us quantify the notion of the curvature. Suppose that the lower and 
upper curves of the “rectangle” are parametrized by t and the right and the 
left curves by s. Moving along a curve parametrized by t does not change s, 
and vice versa. Using a Taylor expansion, in which derivatives are replaced 
by covariant derivatives, parallel translate a contravariant vector A J first to 
the right and then upward [see Figure 17.1(b) for clarification]. Assume that 
the lower left corner has (t, s) as the parameter values. As you move along 
the lower curve, the parameters change from (t,s) to (t + At, s). So, to first 
order in At, we have 

A J (t + At, s) = A> (t, s) + — A t = A> (t, s) + A’, (t, a) ^ At. 



Figure 17.1: (a) In a flat space, the direction of the vector does not change when 
carried along two different paths, (b) In a curved space, the two vectors are different. 
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Now parallel translate this vector upward, the direction in which t is constant 
but s changes: 

A 3 (t + At, s + As) = A j (t + At, s) + (A 3 (t + At, s))As 

Ds 

■ DA 3 D ( • ■ ,dx l \ 

= A 3 (t,s ) + -JjfAt+— \ A 3 (t,s) + A J /t,s)—At J As 

, DA 3 DA 3 A D ( ti , dx l \ 

= A’(t, s) + —At + —As + — [A’fi,,)—At) A, 

, DA 3 DA 3 A . A dx m dx l A . 

= V (t,5) + —it + —As + s)——A,As. 

Since A 3 is assumed to be parallel translated on both curves, DA 3 /Dt = 0 = 
DA 3 /Ds, and 

A 3 (t + At, s + As)i = A 3 {t, s) + s)Ax l Ax m , 

where we used Ax 1 « ( dx l /dt)At and Ax m « {dx m / ds)As. The subscript 1 
on the left hand side stands for the “first route.” The “second route” is going 
up first and then to the right. It should be clear that the only difference in 
the final result is the interchange of l and m. We therefore have 

A 3 (t + At, s + A s) 2 = A 3 {t, s) + s) Ax l Ax m . 

Thus, using A 0 ., for the second covariant derivative, we have 

A 3 (t +At,s +As)i-A 3 (t +At,s +As) 2 = (^A 3 lm - A j ml ^j Ax 1 Ax m . (17.66) 

The difference in parentheses should be related to the curvature of the space 
(manifold) under consideration. 

Finding this difference is straightforward. Using the rule of covariant dif¬ 
ferentiation for general tensors, we get 


U __£ _i_ pi A k _ r p A 3 

' 1 km,l 1 Zm ,p 


Q x m ' km Lm ;p 

d (8A 3 A (8A k 

dx m ( dx l + klA ) + km \ dx l 


_i_ v k A r — r p A 3 

+ 1 rl A L lm A ;p 


= i Ak , y 3 ® Ak 4- r J ® Ak + r J r fc A r - r p A 3 

dx m dx l dx m kl dx m km dx l km rl lm ;p ' 

In the last line switch l and m to get A J ml : 

j _ d 2 A 3 dT j km k ■ dA k j- dA k fc _ ^j 

, ml dx l dx m ~ dx l ^ km Q x l T1 iii Q x m ^ fei rm^ x ml^W' 

Subtracting, and changing the dummy indices when necessary, we obtain 

/ r)V 3 r)V 3 \ 

Ai _ a 3 — _ M. _ km. ±p F r - W r r I A k — (T p — F p 1 A 3 

^\lm \ Q x m Q x l ' 1 rm L kl 1 rl L km j ^ v 1 lm 1 ml) ,p‘ 


(17.67) 
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Riemann 
curvature tensor 


flat spaces (or 
manifolds) 


Bianchi identity 


Ricci tensor and 
scalar curvature 


It is straightforward but tedious to show that the expression in the first 
pair of parentheses transforms as a component of a tensor of type (1,3). This 
tensor is denoted by R klm and is called Riemann curvature tensor: 


BY 3 

dj _ UL kl 
^him Q x m 


BY 3 

km . pj pr pO p 
I 1 rrrl \ LI -L p/ -L 


8x l 


rl L km’ 


(17.68) 


The expression in the second pair of parentheses in (17.67) is the torsion tensor 
introduced earlier [see Equation (17.53) and the discussion after it]. 

Example 17.5.1. Example 17.4.1 showed that the metric connection of Euclidean 
space and special relativistic spacetime in Cartesian coordinates are both zero. 
Equation (17.68) shows that for these spaces, the Riemannian curvature tensor 
expressed in Cartesian coordinates is zero. Since Riemannian curvature tensor is a 
tensor, it must be zero in all coordinates, as expressed in Box 17.2.2. Spaces that 
have zero Riemannian curvature tensor are called flat. We thus see that flatness 
is an intrinsic property of a space, independent of any coordinates used in that 
space. ■ 

The curvature tensor has some important properties which we state with¬ 
out proof. One property that is evident from (17.68) is 


R 3 — -R 3 

him ~ 13 kml 


(17.69) 


The second property, which is true only if the torsion tensor vanishes, i.e., 
when the affine connection is symmetric, is 


R klm + R lmk + R mkl ~ °' 


(17.70) 


The third property, which involves the covariant derivative of the curvature 
tensor and is true only for torsion-free connections, is 


R 3 

331 klm 


UJ f ? 3 — n 

31 kmi\l ^ 13 kil:m ~~ u ' 


(17.71) 


This is also called the Bianchi identity. The last property, which holds for 
Riemannian tensor of the metric connection, is that RL. m has n 2 (n 2 — 1)/12 
components. 

Various other tensors can be obtained from the Riemann curvature tensor 
by contraction. For example, by contracting the contravariant index with the 
last covariant index one obtains the so-called Ricci tensor: 


_ r/j _ n] _ 9 Y k j 


Rh — R k ij — R kU — 


klj 


kl 

Bx 3 


dx l 


I pi y _ pi pr 
' L r j L kl 1 rl 1 k j 


rl ■*" kj > 


(17.72) 


and by raising one of the Ricci tensor’s indices and contracting, we obtain the 

scalar curvature: 

R=R\ = g kl R kl . (17.73) 

Einstein’s general theory of relativity explains gravity as a manifestation 
of the curvature of spacetime. Since gravity is caused by mass, and since 
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mass and energy are equivalent, the source of curvature is energy. Pursuing 
this idea, Einstein came up with an equation, the Einstein equation, that 
describes all (large scale) gravitational interactions. Defining the Einstein 
curvature tensor as 

Gij = Rij - \g%jR, ( 17 . 74 ) 

the Einstein equation is written as 

= 8nGTij , ( 17 . 75 ) 

where G is the universal gravitational constant and Tj :j is the energy momen¬ 
tum tensor. 

Example 17.5.2. For the sphere of Example 17.4.2, the Ricci curvature tensor 
can be written as 

a r e QYf 

/-> UL fcl fop T^'-P t~\0 T~'V- > T~'^ 

rtki qq r r ki r 1 r k<p r ^ 

Using this, it is easy to show that Rg v = 0 = R v g, while 

Ree = 1, R vv = sin 2 9 

Furthermore, since g 0e — 1/a 2 and g‘ p ‘ p = l/(a 2 sin 2 9), the scalar curvature becomes 
R = gVR^ = g ee R ee + g™ R vv = A 

showing that a sphere is a space of constant (and positive) curvature, as we 
expect. ■ 


17.6 Problems 

17.1. Write djXj in a form that includes the Kronecker delta. Now show that 

V • r = 3. 

17.2. Recall that a homogeneous function / of n variable of degree q satisfies 

n 

qf(xi,x 2 , .. . ,x n ) = y: Xjdjf. 

i=1 

(a) Differentiate both sides with respect to Xj and show that 

n 

(q - l)djf(xi,x 2 , ...,x n ) = ^2xtdidjf. 

i= 1 

(b) Multiply this equation by Xj and sum over j to obtain 

n 

q(q-l)f(xi,x 2 ,...,x n )= ^ XiXjdidjf. 


Einstein curvature 
tensor and 
Einstein equation 
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17.3. Verify Equation (17.23). 

17.4. Let the scalar function <f> be given by <j>(x, y,z ) = x 2 + y 3 + z and 

x = sin x + cos y + z, y = xy + z, z = x 2 . 

What is the functional form of (jP. 

17.5. Show that the sum of two tensors of type (r, s ) is a tensor of the same 
type. 


17.6. Derive Equation (17.29). Show that = 1. 

17.7. Show that the inverse of a metric tensor given by 


g km (x) ee £ 
p=i 


dx k dx m 
dx ,p dx ,p 


is a tensor of type (2,0). Here {x n } are as defined in the beginning of Section 
17.3. 


17.8. Following Example 17.3.1, find the metric tensor for cylindrical coordi¬ 
nates. 

17.9. Show that the dot products of Equations (17.36) and (17.37) do not 
change in a general coordinate transformation. 

17.10. Verify Equation (17.40) component by component. 

17.11. Using indices, show that the divergence of a curl and the curl of a 
gradient are both zero. 

17.12. Using indices, prove the following “derivative” identities: 

V-(/A) = (V/)-A + /V-A, 

V x (/A) = (V/) x A + /V x A, 

V(/g)=gV/ + /Vg. 

17.13. Using indices, prove the Green’s identity: 

V • (gVf - fVg) = 5 V 2 / - /V 2 g. 

17.14. Prove the following vector identities using index notation for vectors: 

V-(AxB) = B- VxA — A-VxB, 

V x (V x A) = V(V ■ A) - V 2 A. 

17.15. Show that the difference between any two affine connections is a tensor 
of type (1,2). 
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17.16. Verify that Equation (17.46) combines the second and third Maxwell’s 
equations. 

17.17. Verify that F a p = dpA a — d a Ap satisfies Equation (17.46). 

17.18. Differentiate both sides of Equation (17.45) with respect to and 
raise the index (3 to be able to sum over it; use the symmetry of second 
derivative and the antisymmetry of F a p to show that the left-hand side is 
zero. On the right-hand side, you should have something like ^or] l 3 a d a j 0 . 
Show that ri l 3 cr d a Ji 3 = 0 expresses charge conservation or continuity equation 
of Box 13.2.4. 


17.19. With c = 1 and = 1/eo, show that rj au d^A a = 0 is the Lorentz 
gauge condition [Equation (15.32)] 


<94> 

^ + V- A = 0, 


and that rj au dvd a Ap = /ioJ /3 combines the two wave equations [Equations 
(15.33) and (15.34)] 


9 A v.2 A 

~~Qj2 -V A — /r 0 J, 

<9 2 <h 2 

W - v * = ^ 


17.20. Show that DAj of Box 17.4.1 is a covariant vector. 

17.21. Show that 


dAi 
~dx [ ^^ kl 


is a tensor of type (1,1). 

17.22. Show that P( fc given in Equation (17.59) is an affine connection, i.e., 
that it transforms according to Equation (17.52). 

17.23. Show that the metric connection satisfies Equation (17.61). 


17.24. (a) Find all the components of the affine metric connection on the 
surface of the sphere of Example 17.4.2. 

(b) Derive Equation (17.62) from Equation (17.58). 

(c) Show that (17.63) satisfies the second equation of (17.62). 

(d) Show that cotf? = Acos</? + Bsinip is a solution of (17.65). 


17.25. Show that the Riemann curvature tensor of Equation (17.68) is a 
tensor of type (1,3). 


17.26. Example 17.5.1 showed that the Riemannian curvature tensor of the 
Euclidean space, when expressed in Cartesian coordinates is zero. Since Rie¬ 
mannian curvature tensor is a tensor it should be zero when expressed in 
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any coordinate system. Starting with the spherical components of the Eu¬ 
clidean metric obtained in Example 17.3.1, find the components of the metric 
connection in spherical coordinates. From these calculate the components of 
Riemannian curvature tensor and show that they all vanish. 

17.27. Derive the expression for the Ricci curvature tensor of Example 17.5.2 
and show that 

— 0 — R<p9 , R 99 — 1; R^pip — sin 0. 




Part V 

Complex Analysis 



Chapter 18 

Complex Arithmetic 


Complex numbers were developed because there was a need to expand the 
notion of numbers to include solutions of algebraic equations whose proto¬ 
type is x 2 + 1 = 0. Such developments are not atypical in the history of 
mathematics. The invention of irrational numbers occurred because of a need 
for a number that could solve an equation of the form x 2 — 2 = 0. Similarly, 
rational numbers were the offspring of the operations of multiplication and 
division and the quest for a number that gives, for example, 4 when multiplied 
by 3, or, equivalently, a number that solves the equation 3a; — 4 = 0. 

There is a crucial difference between complex numbers and all the num¬ 
bers mentioned above: All rational, irrational, and, in general, real numbers 
correspond to measurable physical quantities. However, there is no single 
measurable physical quantity that can be described by a complex number. 

A natural question then is this: What need is there for complex numbers 
if no physical quantity can be measured in terms of them? The answer is that 
although no single physical quantity can be expressed in terms of complex 
numbers, a pair of physical quantities can be neatly described by a single 
complex number. For example, a wave with a given amplitude and phase can 
be concisely described by a complex number. Another, more fundamental, 
reason is that equations that describe the behavior of subatomic particles are 
inherently complex. 


18.1 Cartesian Form of Complex Numbers 

We demand a number system broad enough to include solutions to the 
equation 

x 2 + 1 = 0 or x 2 = —1. 


Clearly the solution(s) cannot be real because a real number raised to the 
second power gives a positive real number, and we want x 2 to be negative. 
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So we broaden the concept of numbers by considering complex numbers. 
Such numbers are of the form 

z = x + iy with i = a/—I and i 2 = — 1. (18.1) 

It turns out that we don’t need to introduce any other numbers to solve all 
algebraic equations—equations of the form p(x) = 0 with p(x) a polynomial. 
In fact, the fundamental theorem of algebra, to which we shall return, 
states that all roots of any algebraic equation 

a n x n T a n \—X n -\~ • * * -\~ ciiX T clq = 0 

with arbitrary real or complex coefficients ao, «i,..., a n , are in the complex 
number system. In this sense, then, the complex number system is the most 
complete system. 

A complex number can be conveniently represented as a point (or equiv¬ 
alently, as a vector) in the a;y-plane, called the complex plane, as shown 
in Figure 18.1. In Equation (18.1), x is called the real part of z, written 
Re(z), and y is called the imaginary part of z, written Im(;t). Similarly, the 
horizontal axis in Figure 18.1 is named the real axis, and the vertical axis is 
named the imaginary axis. The set of all complex numbers—or the set of 
points in the complex plane—is denoted by C. 

We can define various operations on C that are extensions of similar oper¬ 
ations on the real number system, R. The only proviso is that i 2 = —1, and 
that the final form of an equation must be written as Equation (18.1)—with 
real and imaginary parts. For instance, the sum of two complex numbers, 
zi = xi + iyi and z 2 = x 2 + iy 2 , is 

zi + z 2 = (xi + x 2 ) + i(yi + y 2 )- 

This sum can be represented in the complex plane as the vector sum of Z\ and 
z 2 , as shown in Figure 18.2. The product of z\ and z 2 can also be obtained: 

21 Z 2 = {x\ + iyi){x 2 + iy 2 ) = x\x 2 + xi(iy 2 ) + iy\x 2 + iyi(iy 2 ) 

= Xxx 2 + i[x\y 2 + y\X 2 ) - yry 2 = XiX 2 - y x y 2 + i[xpy 2 + y x x 2 ). 

Thus, 

Re(ziz 2 ) = x x x 2 - j/ 12 / 2 , 

Im^!^) = xry 2 + x 2 y x . (18.2) 



Figure 18.1: Complex numbers as points or vectors in a plane. 
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Figure 18.2: Addition of complex numbers as addition of vectors. 

To obtain this equation, we have implicitly used the fact that two complex 
numbers are equal if and only if their real parts are equal and their imaginary 
parts are equal. 

The factor i in 2 allows new operations for complex numbers that do not 
exist for real numbers. One such operation is complex conjugation. The 
complex conjugate, z* or z, of 3 is defined as 

z* = z = (x + iy )* = x — iy (18.3) 

which is obtained from by replacing i with —i. We note immediately that 

zz* = (x + iy)(x — iy) = x 2 + y 2 = z*z 

which is a positive real number. The positive square root of zz* is called the 
absolute value of z and denoted by \z\. It is simply the length of the vector 
representing 2 in the xy-plane. Thus, we have 

|z| = yfzz* = \[z*z = yjx 1 + y 2 = \J (Re(;t)) 2 + (Im( 2 )) 2 . (18.4) 

We can also define the division of two complex numbers using complex 
conjugation. 


Box 18.1.1. To find the real and imaginary parts of a quotient, multiply 
the numerator and denominator by the complex conjugate of the denomi¬ 
nator. 


So, for the ratio of z\jz 2 , we get 


£i _ ziz% _ {x\ +iyi)(x 2 - iy 2 ) _ x\x 2 + 3 / 13/2 + i{y\X 2 - xiy 2 ) 
Z2 Z 2 Z% l^l 2 | £ 2 1 2 

xix 2 + yry 2 ,yix 2 - x\y 2 

l^l 2 |Z21 2 ' 

Thus, 


Re 



^ 1^2 + yiy 2 

xl + yl 


and 



y 1 X 2 - x\y 2 
x\ + y\ 


(18.5) 


complex 

conjugation 


absolute value 
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In particular, 


x — ly 


and 


1 


i 


z \z\* x* + y* 

Some useful properties of absolute values are as follows: 


Fl*21 = \Zl\ \Z 2 \, 


l£i| 

N 


PI - p 2 


< \zi + Z 2 \ < \zi I + \z 2 \. 


(18.6) 


This last inequality is called the triangle inequality and it comes directly 
from the vector property of complex numbers. The right half of it can be 
generalized to more than two complex numbers: 

| n | n 

ifc=i 1 *:=1 


E 


Zk 


^E 


Zk\ 


(18.7) 


Example 18.1.1. Here we present some sample manipulations with complex num¬ 
bers: 


(1 + *) 2 

1 1 
1 — i 1 + i 

( 1 + i )- 4 

2 + i 
3 -i 
2i — 1 
i -2 


(l) 2 + {if +2i = l- l+2i = 2i, 


1 + i — (1 — i 

) 

2 i 

(! -*)(! + *) 


|l + f| 2 

1 


1 

(1 + *) 2 (i + i 

) 2 

(2i)(2i) 

(2 + j)(3 + i) 


5 + *5 

|3-i| 2 

3 2 + (—l) 2 

| — 1 + *2| 


-1) 2 + 2 2 

|-2 + t| 


-2) 2 + l 2 


The equation \z — a\ = b, where a is a fixed complex number and b is real and 
positive, describes a circle of radius b with center at a = a x + ia y . This is easily 
seen because 


b 2 = \z - a\ 2 = | (a: + iy) - (a x + ia y )\ 2 

= \{x- a x ) + i(y - a y )\ 2 = {x - a x f + (y - a y f. 

We note that \z — a\ is the distance between the two complex numbers z and a. 
Therefore, \z — a\ = b —with a a constant and z a variable—is the collection of all 
points z that are at a distance b from a. ■ 


properties of 
complex 
conjugation of 
complex numbers 


Complex conjugation satisfies some nice properties that we list below: 
(21 + 22 )* =2? + 4, (Z 1 Z 2 )* = zfzt, ( — ) =% 

V 22 / 2 2 

Re(z) = ±(z + z*), Im(z) = ^(z - z*), 

(z*f = z, (z n f = ( z*) n . 


(18.8) 
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The complex conjugate of a function of 2 is easily obtained by substituting 
z* for z in that function. 1 This can be summarized as 

(mr = /co (18.9) 

which is equivalent to replacing every i with —i in the expression for f(z). 

In the first half of the sixteenth century there was hardly any change from the 
attitude or spirit of Arabs, whose work had put practical arithmetical calculations 
in the forefront of mathematics, but merely an increase in the kind of activity 
Europeans had learned from Arabs. Moreover, the technological advances spurred by 
the Renaissance demanded further refinement in magnitudes such as trigonometric 
tables and astronomical observations. 

By 1500 or so, zero was accepted as a number and irrational numbers were used 
more freely in calculations. However, the problem of whether irrationals were really 
numbers still troubled people. Michael Stifel (14867-1567), the German mathemati¬ 
cian, argued that 

Since, in proving geometrical figures, when rational numbers fail us irrational 
numbers ... prove exactly those things which rational numbers could not prove 
... we are compelled to assert that they truly are numbers .... On the other 
hand, ... that cannot be called a true number which is of such a nature that 
it lacks precision [decimal representation]. 

He then argues that only whole numbers or fractions can be called true numbers, and 
since irrationals are neither, they are not real numbers. Even a century later, Pascal, 
Barrow, and Newton thought of irrational numbers as being understood in terms of 
geometric magnitude; they were mere symbols that had no existence independent 
of continuous geometrical magnitude. 

Negative numbers were treated with equal suspicion by the sixteenth- and 
seventeenth-century mathematicians. They were considered “absurd.” Jerome 
Cardan (1501-1576), the great Italian mathematician of the Renaissance, was will¬ 
ing to accept the negative numbers as roots of equations, but considered them as 
“fictitious,” while he called the positive roots real. Francois Vieta (1540-1603), a 
lawyer by profession but recognized far more as the foremost mathematician of the 
sixteenth century, discarded negative numbers entirely. Descartes accepted them in 
part, but called negative roots of equations false, on the grounds that they repre¬ 
sented numbers less than nothing. 

An interesting argument against negative numbers was given by Antoine Arnauld 
(1612-1694), a theologian and mathematician who was a close friend of Pascal. 
Arnauld questioned the equality —1 : 1 = 1 : (—1) because, he said, —1 is less than 
+1; hence, How could a smaller number be to a greater as a greater is to a smaller? 

Without having fully overcome their difficulties with irrational and negative 
numbers, the Europeans were hit by another problem: the complex numbersl They 
obtained these new numbers by extending the arithmetic operation of square root 

1 TIi is statement is not strictly true for all functions. However, only a mild restriction 
is to be imposed on them for the statement to be true. We shall not go into details of 
such restrictions because they require certain complex analytic tools which go beyond the 
scope of this book. See Hassani, S. Mathematical Physics: A Modem Introduction to Its 
Foundations, Springer-Verlag, 1999, Chapter 11 for details. 


to find the 
complex conjugate 
of a function, 
change all its i's 
to —i. 




482 


Complex Arithmetic 


polar 

representation of a 
complex number 


a very important 
relation 


to whatever numbers appeared in solving quadratic equations. Thus Cardan sets 
up and solves the problem of dividing 10 into two parts whose product is 40. The 
equation is *(10 — *) = 40, for which he obtains the roots 5± \f —15 and then he says 
“Putting aside the mental torture involved,” multiply these two roots and note that 
the product is 25 — (—15) or 40. He then states, “So progresses arithmetic subtlety 
the end of which, as is said, is as refined as it is useless.” 

Descartes also rejected complex roots and coined them “imaginary.” Even New¬ 
ton did not regard complex roots as significant, most likely because in his day they 
lacked physical meaning. The confusion surrounding complex numbers is illustrated 
by the oft-quoted statement by Leibniz, “The Divine Spirit found a sublime outlet 
in that wonder of analysis, that portent of the ideal world, that amphibian between 
being and not being, which we call the imaginary root of negative unity.” 


18.2 Polar Form of Complex Numbers 

The introduction of polar coordinates in the complex plane makes available 
a powerful tool with which to facilitate complex manipulations. Figure 18.3 
shows a complex number and its polar coordinates. In terms of these polar 
coordinates, z can be written as 


z = x + iy = r cos 9 + ir sin 9 = r( cos 9 + i sin0). (18.10) 


Assuming that series of complex numbers can be manipulated as those of real 
numbers, we obtain the useful relation between imaginary exponentials and 
trigonometric functions. 

In Chapter 10 we presented the Maclaurin series for the exponential and 
trigonometric functions. Let us assume that those functions are valid for 
complex numbers as well. Then, we have 


e iS = 


_ \ “ _ \' 
^ n\ ^ 


n —0 


W 

n! 


E 

a=odd 


(M 

n\ 


E 

k—0 


m 


2k 


(2 k)\ 


= E(-D 


k—0 


n=even 

r\2k 00 f)2k+l 

: (2*y +1 B-T (aTIji = cm6 + isinfl 


£ 

k—0 


m 


2/c+l 


(2fc + l)! 


(18.11) 



Figure 18.3: Complex numbers in polar coordinates. 

2 This assumption turns out to be correct. In particular, the power series expansion used 
in the following example plays a central role in complex analysis. 
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because i 2k = (i 2 ) k = (—l) fc . This is probably the most important relation in 
complex number theory. 


Box 18.2.1. The trigonometric and imaginary exponential functions are 
related by the Euler equation: e l6 = cos 9 + zsin(9. 


The use of Equation (18.11) in (18.10) leads to another way of representing 
complex numbers: 

z = re l6 , r = \Jx 1 + y 2 , 9 = tan -1 ^ . (18.12) 

Note that 


Box 18.2.2. The angle 8 is not uniquely determined: Any multiple of 2ir 
can be added to it without affecting z. 


We can use Equation (18.12) together with x = rcos9, y = r sin 0 to con¬ 
vert from Cartesian coordinates to polar coordinates, and vice versa. The 
coordinate 6 is called the argument of z and written 9 = arg(z). 

Example 18.2.1. Let us look at some numerical examples of polar-Cartesian 
conversion. In many cases, a diagram can be very helpful. For instance, take i 
whose real part is obviously zero and whose imaginary part is 1. If we were to use 
the formula, we would have tan# =1/0 which is not defined. However, Figure 18.4 
shows that z = i lies on the positive imaginary axis, and, thus, 9 = 7r/2. Since we 
can always add a multiple of 27r to the angle, we have 


i 


= e 


in / 2-\-i2mv 


n = 0, ±1, ±2,.... 


Similarly, the same figure makes it clear that 

-i = e ~™/ 2 + i2n * = ei3-/2+i2n^ ; n = 0j ±1) ±2, . . . . 




argument of a 
complex number 


Figure 18.4: Cartesian and polar coordinates for i and —i. 
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Figure 18.5: Cartesian and polar coordinates for some other complex numbers. 


Referring to Figure 18.5, the reader may verify the following polar representa¬ 
tions of complex numbers: 


— 1 = e 


i'K-\-i2n , K 


l + i = V2e in/4+i2nn , 

^ _ j _ y ^2 g -in/'L+i2mr _ ^VTn/i+iZmt 

2 j3 _ ^/t3 e* tan -1 (3/2) + i2n7r _ ^iO.QSi+iin-n 

— 1 + i2 = V§ e ita n _1 (-2)+i2mr _ e i2.03+i2n7r 


In all cases, n is an integer and angles are in radians. ■ 

The complex conjugate of z in polar coordinates is 

z* — x — iy = r cos 9 — irsm.9 = rcos(—9) + ir sin(—9) = re~ 10 . 

This equation confirms the earlier statement that complex conjugation is 
equivalent to replacing i with —i. 

Generally speaking, polar coordinates are useful for operations of multipli¬ 
cation, division, and exponentiation, and Cartesian coordinates for addition 
and subtraction. 

Example 18.2.2. We can use the polar representation of complex numbers to 
find some trigonometric identities. In all of the following, we set r = 1: 

1 = e l9 e _lS = (cos 8 + i sin #)(cos 8 — i sin 8 ) — cos 2 8 + sin 2 8 . 

Now consider the identity 

e 40 i+ 02 ) = cos ( 01 + e 2 )+i sin(6>i + 8 2 ) 

which can also be written as 

+e 2 ) _ e »0i e »02 _ ( cog q i _|_ j s i n ^ ( cos 0 2 + j s i n 0 2 ) 

= cos 8 i cos d 2 — sin 8 1 sin 82 + i(sin 8 1 cos 82 + sin d 2 cos 8 1 ). 

Equating the real and imaginary parts of the last two equations, we obtain 

cos(#i + d 2 ) = cos 81 cos 82 — sin 8 1 sin 82 , 
sin(#i + 82 ) = sin 8 1 cos 82 + sin 82 cos 8 i. 
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Similarly, equating the real and imaginary parts of 

e l3e = cos 3 9 + i sin 3# 

and 

e 3e = = ( cos $ + * sin 9 ) 3 = cos 3 9 + 3i cos 2 9 sin 9 — 3 sin 2 9 cos 9 — i sin 3 9 

gives the following trigonometric identity: 

cos 39 = 4 cos 3 9 — 3 cos 6 , 

sin 39 = 3 sin 9 — 4 sin 3 9. i 


From 

e inB = cos n6 + i sin n6 and e in£ = (e id ) n = (cos 9 + i sin 9) n 

we obtain the so-called de Moivre theorem: 

(cos 9 + isin9) n = cos nO + ism.nO. (18.13) 

Equation (18.11) and its complex conjugate lead to the following useful 
results: 


cos 9 = \ (e lB + e l8 ), 

sin 9 = j. (e ie -e~ ie ) . (18.14) 

As mentioned earlier, the exponential nature of polar coordinates makes 
them especially useful in multiplication, division, and exponentiation. For 
instance, 


£i = = 

z 2 r 2 e l62 r 2 

2 i 2 2 = (ne iei ) (r 2 e lf?2 ) = nr 2 e t(fil+#2) , 
y/z = V re i6 = (re 16 ) 1 ^ 2 = r 1 ^ 2 (e* e ) 1 ^ 


/re 


i6/2 


(18.15) 


and so forth. 

All of these relations have interesting geometric interpretations. For ex¬ 
ample, the second equation says that when you multiply a complex number z\ 
by another complex number 2 2 , you dilate the magnitude of 21 by a factor r 2 
and increase its angle by f? 2 . That is, multiplication involves both a dilation 
and a rotation. In particular, if we multiply a complex number by e lut where 
t is time, we get a vector of constant length in the sy-plane that is rotating 
with angular velocity lo. 

Example 18.2.3. A plane wave is represented by a periodic function such as 
Acos(kx-uit) or B sin(kx — uit). 


de Moivre theorem 


two important 
relations 
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On the other hand, sine and cosine are related by 

sin(fcr — ut) = — cos (fca; — ut + ^ . 

Therefore, one can concentrate solely on the cosine function with a phase an¬ 
gle added to its argument. Thus a typical periodic plane wave is represented as 
.4 cos (kx — uit + a). To make connection with the material of this section, we note 
that 

A cos (kx -u>t + a)=ARe (e^-^ 00 ) = Re (Ae i{kx ~ wt+a) ^ 

= Re (Ae ia e i{kx - Ut)> ) = Re (ze i(fca!_ ‘ , ' t) ) , 

where Z is a complex number—called complex amplitude—of magnitude A and 
argument a. It is therefore convenient to represent plane waves by the complex 
function Ze l< - kx ~ ult ^ which includes the phase of the wave as the argument of Z. ■ 

Another interesting application of these ideas is finding roots of complex 
numbers. Suppose we are interested in all the nth roots of Z\ i.e., all z's 
satisfying z n = Z. To find the roots of a complex number Z, write it in polar 
form in the most general way: 

Z = Re i@+i2nk , k = 0, ±1, ±2,..., 

Thus, 

z n = Re i&+i2-Kk with jfc = 0, ±1, ±2,.... 

Taking the nth root of both sides, we obtain 

3 = Z 1/n = RVn e ie/n+i2nk/n, = Q) ±1> ±2,..., 


and 


Box 18.2.3. The distinct nth roots {zk} of Z = Re l& are 

Zk = R l/n e i0/n+i2^k/n^ Jfe = 0, 1, 2.« — 1. (18.16) 

We see that the number of nth roots of a complex number is exactly n. 

It is clear that Zk of Equation (18.16) repeats itself for k > n. 

Example 18.2.4. Let us find the three cube roots of unity. With n = 3 and 
Z = e l27rk , we have 

^ = e iW , fc « 0,1, 2, 

or 


U 1 

z o = e = 1, 


i2n/3 

Z 1 = e 1 — COS — + l Sill — 

*47r/3 471 . . 471 

Z 2 = e ' — cos — + i sm — 


i .Vs 
2 +l ~ 
i _ .Vs 
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It is instructive to show directly that 


1 ,V3 

_ 2 +l ~ 


= 1 


and 




= 1. 


Here are some more examples of finding roots: 


vTTi = ( V2e in/i+i2nn ) 1/2 = 2 1 /4 e i7r/8+inn n = 0,1, 


') 

zo = 2 1/4 e l7r/8 = 2 1/4 jcos + isin j == 1.1 + *0. 
zi = 2 1 /4 e in/8+i7T = -2 1/4 e in/8 = -1.1 - *0.456. 


456, 


The equation z 3 = i has the roots 

1/3 


0 


_ ( /2-\-i2mr'\ ' _ ^47i/6+42n.7i/3 


ri = ( 

in/6 f n \ , ■ ■ V3 1 

z ° ~ e =cos(-j+*sm(-j= —+*- 


n = 0,1,2, 


_ 471 / 6 + 4271/3 _ 


z\ — e 


57T 


_ 471/6+4471/3 _ 


Z 2 = e 


57T 


cos — + isin — ) = —-—b i— 


cos I —— I + 1 sm I —— I = — 1 . 


Vs 


The reader is urged to show that z\=i for k = 0, 1, 2. 

Note how careful we were to include the factor of e l2nn when taking roots of 
complex numbers. ■ 

All nth roots of Z = Re 10 are equally spaced on a circle of radius in 
the complex plane. Figure 18.6 shows two circles on which the sixth and the 
eighth roots of unity are located. 




Figure 18.6: The (a) sixth and (b) eighth roots of unity. 


Example 18.2.5. In certain applications of electromagnetic wave propagation (as 
in conductors) it becomes necessary to find an analytic expression for the Cartesian 
representation of the square root of a complex number. In this example, we derive 
such an expression. 


Cartesian form of 
the square root of 
a complex number 
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We are trying to calculate the Cartesian representation of the square root of 
z = x + iy. First we express z in polar form; next we take its square root, and 
finally reexpress the result in Cartesian form. Thus, 

z — re z ( e + 2nn ) where r = \/x 2 + y 2 , tand = — , n = 0, ±1, ±2,.... 

x 

Taking the square root of both sides yields 

= Z 1/2 = r V2 e i(«+2-)/2 = (a .2 + 1/ 2 )1 /4 e i®/2+in tt 

= ±(x 2 + y 2 ) 1 ^ 4 e ^e/,2 = ±(* 2 + y 2 ) 1 ^ 4 (cos ^ + isin ^) 


because e ln7r = 1 if n is even and e mn = — 1 if n is odd. All that is left now is to 
express the trigonometric functions in terms of x and y: 


c°s- = 


O - [h 1 + c osS)] / - —J= fl + 


= 7 ! 1 + 


\/2 \ a/ 1 + tan 2 9 

1/2 


1/2 


V 1 + ( y / x ) 2 


= 4= i + ^u 

\/2 \ a/* 2 + y“ 


1/2 


Similarly, 


m 


. f i , .. 

SUl - = —-= 1 — -- 

2 V2\ a Jx 2 + y 1 


1/2 


Collecting all these formulas together and simplifying, we obtain 

, 1/2 


^Ti-y = ±— 2[ 


(y/x 2 + y 2 + |a:| j + i (a/® 2 + y 2 - |*|^ 


1 / 2 ' 


(18.17) 


The complexity of the expression for the square root rests on our insistence on 
an analytic form. The process of converting the Cartesian form of a complex number 
to polar, taking the square root, and converting the result back to Cartesian form 
is a far easier process than the one leading to Equation (18.17). ® 


18.3 Fourier Series Revisited 


The connection between the trigonometric and exponential functions can be 
utilized to write the Fourier series expansion of periodic functions more suc¬ 
cinctly. If we substitute 


2n7T2! e 2in ™/ L + e~ 2in ™/ L 


cos ■ 


2nnx ^linnx/L _ g—2innx/L 


sm - 


L 2 i 

in Equation (10.38) and collect the similar exponential terms, we obtain 

oo 

f(x) = a 0 + ± ]T [(“" - e 2in7ra/i + (o„ + ib n ) e~ 2m ™' L 


n—1 
oo 


= a 0 + I J2 (°» - + lE( fl " + ib '") e~ 2in ™ /L . (18.18) 
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In the second sum, let n = — m to obtain 


2nd sum = ± Y (a_ m + ib„ m ) e 2im ™' L = \Y ( a -« + ib ~") e 2m ™' L , 

m =—1 n=— 1 

(18.19) 

where in the last step, we switched the dummy index back to n. If we now 
introduce new coefficients A n defined as 

{ \ (a n — ib n ) if 1 < n < oo, 

\{a- n + ib- n ) if — oo < n < —1, 
ao if n = 0, 


and use Equation (18.19) in (18.18), we obtain 


+oo 


f{x) = Y An 


0 2inirx / L 


n =—oo 


where L = b — a, 


Fourier series in 
terms of complex 
exponentials 

(18.20) 


which is the equation we are after. To find A n directly from this equation, 
multiply both sides by e - 2lk ' KX / L ^ integrate from a to b , and use the readily 
obtainable relation 

0 if n ^k T „ . . 

= LS n k, (18.21) 

L if n = k 
where S n k is the Kronecker delta. It follows that 

Ak = \ f" f( x )e~ 2ik ™ /L dx or A n = ± j* f(x)e~ 2 in ™/ L dx. (18.22) 

It is customary to redefine the coefficients in the summation of Equation 
(18.20) in such a way that the summation giving /( x) and the integral giving 
A n are more symmetric, i.e., have the same constant in front of them. To this 
end, define /„ = \^LA n . Then (18.20) and (18.22) become 


0 2i(n—k)-Kx/L _ 


/O) 


l 

7l 


+oo 

E r 2imrx/L 
Jn e ? 

n =—oo 


fn 


jj(x)e- 2in ™/ L dx 


(18.23) 


Note that the coefficients /„ are complex; however, when f(x) is a real 
function, the exponentials and their complex coefficients add up in such a 
way that the final result can be expressed as an infinite sum of trigonometric 
functions with real coefficients. In fact, we can show this generally using 
Equations (18.23). First, we note that, for real f{x), 

f* = ^= £ f{x)e +2in ™/ L dx = f(x)e- 2i (~ n ^ L dx = /_„. 

(18.24) 
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Next, we split the sum in (18.23) into positive integers, negative integers, and 
zero: 


i i f 1 00 

f( x ) = ~7p ^ fne 2in ™ /L + ^= + -^=J2^ e2in ™ /L - ( 18 - 25 ) 

v n =—oo v v n= 1 

Changing the dummy index n to — m, the first sum can be rewritten 

i - 1 i °° 

1st sum = — Y, f- m e- 2im ™ /L = f-me- 2tm ™ /L 

v — m.= — oo v m.= 1 


. as 


-m =—oo 
oo 


oo -j oo 

\ ^ r* —2im-Kx/L _ 1 \ ^ x* —2imrx/L 

~ VL ^ Jn 

m= 1 


n—1 


where we used Equation (18.24) and changed m back to n at the end. Sub¬ 
stituting the last equation in (18.25) yields 

f , oo 

f(x) = J^ + —J2 (. f*e~ 2in ™/ L + f n e 2m ™' L ) 

v v n= 1 

n= 1 

showing that f(x) is indeed real. Equation (18.23) implies that fo is also real 
when f(x) is. It is not hard to show that the expression in the parentheses of 


Example 18.3.1. Let us redo the square potential—whose Fourier series was 
calculated in Example 10.6.1—using exponentials. From Equation (18.23), for n^O, 
we obtain 

1 /*2T i pT 

/„ = -jL= / V{t)e- 2innt/(2T) dt = -L= / V 0 e~ innt/T dt 
V2T Jo VZT Jo 

_ bb T ^ — in-nt/T 

s/VT —inn 

because e tnn = (e I7r ) n = (—1)™. Similarly, fo = Voy/T /2. We now substitute these 
in the Fourier series expansion 


T Vq 
2 inn 


[!-(-!)”] 


V(t) = 


1 

\pxf 


+oo 

\ A r 2imrt/2T 
/ , 

n= — o o 


to get 


E(t) 



-1 

E 


n= — o o 


Vo / -j\ni 2irc.7r£/2T 

2in7r 


+ E 


^0 M _ ^_ -^\ni^2imrt/2T 

2inn 1 
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If we change the dummy index of the first sum from n to — m, and back to n again, 
and put the two sums together, we obtain 




Vb 


jimst/T ^ — in'Kt j T \ 


n= 1 

Vo 2Vp 

~ 2 + 7T 


oo .. 

y if, 

2 in V 


^imrt/T ^—imrt/T j 


=2i sin(n 7 rt/T) 


Vo 2Vo ^ sin[(2fc + l)ivt/T] 

~ ~2~ + ~ 2fc + 1 ' 


which is the expansion we obtained in Example 10.6.1 using trigonometric 
functions. ■ 


18.4 A Representation of Delta Function 

Consider the function Dt(x — Xo) defined as 

1 r T 

D t {x -x 0 ) = — y e^ x ~ Xo)t dt. 


(18.26) 


The integral is easily evaluated, with the result 

\ e i(x-x 0 )t ,T 


D t {x - x 0 ) = — T 


27t i(x — xq ) 


-T 


1 sinT(x — Xq) 
7r x — Xo 


The graph of Dt{x) as a function of x for various values of T is shown in 
Figure 18.7. Note that the width of the curve decreases as T increases. The 
area under the curve can be calculated: 


Dt(x — xo) dx = 


1 smT(x — xo) 1 

— / - dx = — 

7T J-oo X-XQ 7T 


sm y 


dy = 1. 


Figure 18.7 shows that Dt(x — Xq) becomes more and more like the Dirac 
delta function as T gets larger and larger. In fact, we have 


S(x - x 0 ) = lim D t (x-x 0 )= lim l smT ( x — 

oo T—»oo 7T X — Xo 

To see this, we note that for any finite T we can write 

TsinT(x-x 0 ) 
D t (x - xo = --\ * 

7T 1 [X — Xo) 

Furthermore, for values of x that are very close to Xo, 

sin T(x — xq) 


T{x — Xq) —+ 0 


and 


T(x-xq) 


1. 


(18.27) 
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-5 0 5 


Figure 18.7: The function sin Tx/x also approaches the Dirac delta function 
as the width of the curve approaches zero. The value of T is 0.5 for the dashed 
curve, 2 for the heavy curve, and 15 for the light curve. 


delta function as 
integral of 
imaginary 
exponential 


Thus, for such values of x and Xo, we have Dt{x — xo) ~ (T/n), which is 
large when T is large. This is as expected of a delta function: <5(0) = oo. On 
the other hand, the width of Dt{x — Xo) around Xo is given, roughly, by the 
distance between the points at which Dt{x — xq) drops to zero: T[x — Xo) = 
±7r, or x — Xq = ±ir/T. This width is roughly Ax = 2tt/T, which goes to 
zero as T grows. Again, this is as expected of the delta function. Therefore, 
from (18.26) and (18.27), we have the following important representation of 
the Dirac delta function: 


6 (x - Xq) 



e i(x-xo)t dt 


(18.28) 


Equation (18.28) can be generalized to higher dimensions, because (at least 
in Cartesian coordinates) the multi-dimensional Dirac delta function is just 
the product of the one-dimenstional delta functions. Using the more common 
k instead of t as the variable of integration, the two-dimensional Dirac delta 
function can be represented as 


<5(r -r 0 ) 


1 




gik- (r— T o) dkxdky 



e ik.(r-r 0 ) rf 2 fc) 


(18.29) 

where £l 0 0 means over all k x k y - plane and in the last integral we substituted 

d 2 k for dk x dky. 

Similarly, the three dimensional Dirac delta function has the following 
representation: 


<5(r -r 0 ) 



(18.30) 


where d 3 k means a triple integral over k and 0 means over all fc-space. 












18.5 Problems 


493 


18.5 Problems 


(j) 


(c) (a — ib)(2a + 2ib). 
1 + 3 * 


18.1. Find the real and imaginary parts of the following complex numbers: 
(a) (2 — i)(3 + 2i). 

(d) jij. (e) 

(g) 1 ± 2 ±. (h) 

2-3 i K ’ 

(k) 


(b) (2 - 3*)(1 + *). 
1 + * 

2-f 
2 

l~i' 

1 + 2 i 2 — i 


(f) 

(i) 


1 - 2 * 
1 — i 
1 + i 


(l-*)(2-*)(3-i) 


3-4* 


5* 


18.2. Convert the following complex numbers to polar form and find all cube 
roots of each: 


(a) 2 — i. 

(b) 2 - 3 i. 

(c) 3-2i. 

(d) i. 

(e) -*. 

i 

. . 1 + * 

. .1 + 3* 

(f) i + r 

(g) -. 

2 — i 

^ 1 — 2* 

(i) 1 + iV 3. 

2 + 3* 

“ 3 — 4i 

(k) 27*. 

(1) -64. 

(m) 2 — 5*. 

(n) 1 + i. 

(o) 1 - *. 

(p) 5 + 2* 


18.3. Using polar coordinates, show that: 

(a) (-1 + *) 7 = -8(1 + *). ( b ) (1 + *V3)- 10 = 2 -11 (—1 + isj 3). 

18.4. Find the real and imaginary parts of the following: 

(a) (l + iv 7 ^) 3 . (b) (2 + *) 53 . (c) <Ti. (d) yj 1 + iV%. 

f\ - A 81 

(e) (1 + *V3) 63 . (f) (—i) . (g)^=i. (h)^=T. 

(i) ■ (j) (i + i) 22 . (k )VT=i. (l) (i -if. 

18.5. Find the Cartesian form of all complex numbers 2 which satisfy (a) 
z 3 + 1 = 0, and (b) z 4 — 16* = 0. 

18.6. Find the absolute value of - — ^ and — — . 

3-4* a - ib 

18.7. Derive the following trigonometric identities: 

cos 39 = 4 cos 3 8 — 3 cos 9, 
sin 39 = 3 sin 9 — 4 sin 3 9. 


18.8. Show that Equation (18.11) leads to Equation (18.14). 
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18.9. Show that z is real if and only if 2 = z*. 

18.10. Show that | Re(z)| + | Im(^)| > \z\ > (| Re(;j)| + | lm(z)\)/\/2. 

18.11. Let z\ = X\ + iy\ and Z 2 = x 2 + iy 2 represent two planar vectors Zi 
and Z 2 . Show that 

Z 1 Z 2 = zi • z 2 — ie z • zi x z 2 . 

18.12. Sketch the set of points determined by each of the following conditions: 

(a) \z — 2 + i\ = 2. (b) |z + 2i\ < 4. (c) \z + i\ = \z — i\. 

(d) Im(z* + i) = 2. (e) 2z + 3z* = 1. (i) z 2 + (z*) 2 = 2. 

Hint: Find a relation between x and y. 

18.13. Show that the equation of a circle of radius r centered at z 0 can be 
written as \z\ 2 — 2Re(z,2o) = r 2 — \zo\ 2 . 

18.14. Given that Z\Z 2 yf 0, show that 

(a) Ref^i^) = | 2 :i11 2 2 1, and \z\ + z 2 | = \zi\ + | 2 2 |, if and only if arg(zi) - 
arg(^ 2 ) = 2mr, for n = 0, ±1, ±2,.... 

(b) What does the second equality mean geometrically? 

18.15. Assume that z ^ 1 and z n = 1. Show that 1 + z+ z 2 + • • • + z n_1 = 0. 

18.16. Substitute x + iy for z in z 2 + z + 1 = 0 and solve the resulting 
equations for x and y. Compare these with the roots obtained by solving the 
equation in z directly. 

18.17. Find the roots of z A + 4 = 0 and use them to factor z 4 + 4 into a 
product of quadratic polynomials with real coefficients. Hint: First factor 
z 4 + 4 into linear terms. 


18.18. Evaluate the following roots and plot them on the complex plane: 
(a) tyl+i. (b) (c) v 7 !. (d) 32- 

(e) V3 + 4 i. (f) v^-1- (g) v 7 —16L (h) \/^l. 


18.19. Use binomial expansion to show directly that 

3 

= 1 and 


,V3 N 


1 ,V3 N 

2 * 2 


= 1 . 


18.20. Use / e ax = e ax /a to find the indefinite integral of sin 2 x. Verify that 
the derivative of your answer is indeed sin 2 x. 

18.21. Use / e ax = e ax /a and e lbx = cos(6x) + * sin(6a;) to verify the following 
relations by integrating a certain complex exponential: 


J e ax cos (bx) dx 


j e ax sin (bx)dx 


e ax 

— — —lacosibx) +6sin(6a;)], 
a z + b z 


e ax 

— —-^[asinf&x) — 6cos(6x)l, 
a z + b z 


where a and b are assumed to be real constants. 
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18.22. (a) Using J2k=i rk = ( rW+1_ r)/(r—1)> evaluate the sum J2k =i e 
In particular, show that 


N 

E< 

k =1 


,i(a—/3fc) 


p -i0JV _ 

_ _£ 


»-i/3 _ X 


(b) Now show that if j3 = 2n/N, then 


N 


N 


cos(a — /3/c) = 0 = ^ sin (a — /?fc). 


fc=l 


fc=i 


18.23. Express cos 46* and sin 46* in terms of powers of cos0 and sin 0. 

18.24. Use mathematical induction to show the de Moivre theorem. 

18.25. Using binomial expansion and the de Moivre theorem, show that 



where [a:] stands for the greatest integer less than or equal to x. 

18.26. Derive Equation (18.17) from the equations preceding it. 

18.27. Find the following sums, where a and (3 are real: 

(a) cos a + cos(a + (3) + cos(a + 2(3) + ■ ■ ■ + cos(a + n(3). 

( b ) sin a + sin(a + (3) + sin(a + 2(3) + • • • + sin(a + n(3). 

Hint: Use the result of Problem 18.22. 

18.28. Show that 


0 2i(n—k)-Kx/L _ 


if k, 
if n = k, 


where b = a + L. 

18.29. Use Equations (18.20) and (18.21) to obtain Equation (18.22) 

18.30. Find the Fourier series expansion of Problem 10.22 using complex 
exponentials. 

18.31. An electric voltage V(t) is given by 

V(t) = V 0 sin(||), 0 < t <T 


and repeats itself with period T. Find the Fourier series expansion of V (t) 
using complex exponential functions. 
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18.32. A periodic voltage is given by the formula 


V(t) 


Vosin(7rt/2T) if 0 <t<T, 

0 if T < t < 2T, 


in the interval (0,2T). Find the Fourier series representation of this voltage 
using complex exponential functions. 

18.33. A periodic voltage with period 4T is given by 

= ,f 

[o if T < \t\ < 2T. 


Write the Fourier series of V ( t ) using complex exponential functions. 

18.34. The function f(x) is given by the integral 

/ OO 

g{y)e ixy dy. 

-OO 

Find g(y) as an integral over /(&). Hint: Multiply both sides of the equation 
by e~ lxz and integrate over x, changing the order of integration on the right- 
hand side and using (18.28). 




Chapter 19 

Complex Derivative 
and Integral 


So far we have concerned ourselves with the algebra of the complex numbers. 
The subject of complex analysis is extremely rich and important. The scope 
and the level of this book does not allow a comprehensive treatment of complex 
analysis. Therefore, we shall briefly review some of the more elementary 
topics and encourage the reader to refer to more advanced books for a more 
comprehensive treatment. We start here, as is done in real analysis, with the 
notion of a function. 


19.1 Complex Functions 

A complex function f(z) is a rule that associates one complex number to 
another. We write f(z) = w where both z and w are complex numbers. The 
function / can be geometrically thought of as a correspondence between two 
complex planes, the z-plane and the ui-plane. In the real case, this correspon¬ 
dence can be represented by a graph. It could also be represented by arrows 
from one real line (the ai-axis) to another real line (the y- axis) joining a point 
of the first real line to the image point of the second real line. When the 
possibility of graph is available, the second representation of real functions 
appears prohibitively clumsy! For complex functions, no graph is available, 
because one cannot draw pictures in four dimensions! 1 Therefore, the second 
alternative is our only choice. The wr-plane has a real axis and an imaginary 
axis, which we can call u and v, respectively. Both u and v are real functions 
of the coordinates of z, i.e., x and y. Therefore, we may write 

f(z) = u(x,y) + iv(x,y). (19.1) 

1 The “graph” of a complex function would be a collection of pairs ( z,f{z )) just as the 
graph of a real function is a collection of pairs ( x , f(x)). While in the latter case the graph 
can be drawn in the lx. y) plane, the former needs four dimensions because both z and f(z) 
have two components each. 


graph of a 
complex function 
is impossible to 
visualize because 
it lives in a four 
dimensional space 
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/ 


z 

fcj) 

w 

(it, v) 





Figure 19.1: A map from the 2 -plane to the ui-plane. 


This equation gives a unique point ( u , v) in the to-plane for each point 
(x,y) in the 2 -plane (see Figure 19.1). Under /, regions of the 2 -plane are 
mapped onto regions of the to-plane. For instance, a curve in the 2 -plane may 
be mapped into a curve in the tc-plane. 

Example 19.1.1. Let us investigate the behavior of some elementary complex 
functions. In particular, we shall look at the way a line y = mx in the 2 -plane is 
mapped to lines and curves in the w-plane by the action of these functions. 

(a) Let us start with the simple function w = f(z) = z 2 . We have 

w = (x + iy) 2 = x 2 — y 2 + 2 ixy 

with u(x,y) = x 2 — y 2 and v(x,y) = 2xy. For y = mx, these equations yield 
u = (1 — m 2 )x 2 and v = 2mx 2 . Eliminating x in these equations, we find v = 
[2m/(l — m 2 )\u. This is a line passing through the origin of the w-plane [see Fig¬ 
ure 19.2(a)]. Note that the angle the line in the w-plane makes with its real axis is 
twice the angle the line in the 2 -plane makes with the rr-axis. 

(b) Now let us consider w = f(z) = e z — e x+xy , which gives u(x,y) = e x cosy 

and v(x,y) = e* sin y. Substituting y = mx, we obtain u = cos mx and v = 
e x sinma:. Unlike part (a), we cannot eliminate x to find v as an explicit func¬ 
tion of u. Nevertheless, the last pair of equations are the parametric equations of 
a curve (with x as the parameter) which we can plot in a uu-plane as shown in 
Figure 19.2(b). g 



Figure 19.2: (a) The map 2 2 takes a line with slope angle a and maps it onto a line 
with twice the angle in the w-plane. (b) The map e z takes the same line and maps it 
onto a spiral in the w-plane. 
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19.1.1 Derivatives of Complex Functions 

Limits of complex functions are defined in terms of absolute values. Thus, 
linij^a /(z) = wo means that, given any real number e > 0, we can find a 
corresponding real number <5 > 0 such that \f(z)—Wo\ < e whenever \z—a\ < 6. 
Similarly, we say that a function f is continuous at 2 = a if lim,^ a /(z) = 
/(a), or if there exist e > 0 and S > 0 such that | f(z) — f{a) \ < e whenever 
\z — a\ < S. 

The derivative of a complex function is defined as usual: 

Definition 19.1.1. Let f(z) be a complex function. The derivative of f at 
zo is 

df f(z 0 + Az) - /(z 0 ) 

— = iim --- 

dz , Az^o As 

Zo 

provided the limit exists and is independent of Az. 

In this definition “independent of Az” means independent of Aa; and Ay 
(the components of Az) and, therefore, independent of the direction of ap¬ 
proach to zo- The restrictions of this definition apply to the real case as well. 
For instance, the derivative of f(x) = \x\ at x = 0 does not exist because it 
approaches +1 from the right and —1 from the left. 

It can easily be shown that all the formal rules of differentiation that 
apply to the real case also apply to the complex case. For example, if / 
and g are differentiable, then / ± g, fg , and -as long as g is not zero— f /g 
are also differentiable, and their derivatives are given by the usual rules of 
differentiation. 


Box 19.1.1. A function /(z) is called analytic at Zo if it is differentiable 
at Zo and at all other points in some neighborhood of zq- A point at which 
f is analytic is called a regular point of f. A point at which f is not 
analytic is called a singular point or a singularity of f. A function for 
which all points in C are regular is called an entire function. 

Example 19.1.2. Let us examine the derivative of f(z) = x + 2iy at z = 0: example 

illustrating 

df _ /(Az) — /(0) _ Ax + 2iAy path-dependence 

dz 2=0 Az—o Az AyAo derivative 

In general, along a line that goes through the origin, y = mx, the limit yields 

df Ax + 2imAx 1 + 2 im 

— = lim —--— = -. 

dz z _ 0 Am—o A* + imAx 1 + im 


This indicates that we get infinitely many values for the derivative depending on 
the value we assign to m—corresponding to different directions of approach to the 
origin. Thus, the derivative does not exist at z = 0. g 
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A question arises naturally at this point: Under what conditions does 
the limit in the definition of derivative exist? We will find the necessary 
and sufficient conditions for the existence of that limit. It is clear from the 
definition that differentiability puts a severe restriction on f(z) because it 
requires the limit to be the same for all paths going through zo , the point 
at which the derivative is being calculated. Another important point to keep 
in mind is that differentiability is a local property. To test whether or not a 
function f(z) is differentiable at Zo, we move away from zq by a small amount 
Az and check the existence of the limit in Definition 19.1.1. 

For f(z ) = u(x,y) + iv(x,y), Definition 19.1.1 yields 

— - lim I u ^ x ° + Ax ’ Vo + ^1 ~ u ( x °’ Vo ) 

dz „ Ai^o } Ax + iAy 

2 0 Ay—*0 v y 

. v(x 0 + Ax, 2/0 + Ay) - v(x 0 ,yo) 

+ 1 -I-TT- 

Ax + i Ay 


Cauchy-Riemann 

conditions 


If this limit is to exist for all paths, it must exist for the two particular paths 
on which Ay = 0 (parallel to the x-axis) and Ax = 0 (parallel to the y-axis). 
For the first path we get 



lim u(xp + Ax, y 0 ) - u(x o, y 0 ) 
Ai^O Ax 

. x(x 0 + Ax,y 0 ) - v(xo,yo) 

+ 1 hm --- 

Ax— >0 Ax 


du 


dx 


(000,vo) 


For the second path (Ax = 0), we obtain 


dv 


dx 


(xo,Vo) 



lim u(x 0 , yo + Ay) - u(x 0 , y 0 ) 
Ay—*0 iAy 


. v(xo,yo + Ay)-v(xo,y 0 ) 

1 hm --- 

Ay—*0 iAy 


du 

dy 


(*o,2/o) 


dv 


dy 


(xo,Vo) 


If / is to be differentiable at Zq, the derivatives along the two paths must be 
equal. Equating the real and imaginary parts of both sides of this equation 
and ignoring the subscript Zq (xq, 2/0, or Zq is arbitrary), we obtain 


du dv du dv 

dx dy dy dx 


(19.2) 


These two conditions, which are necessary for the differentiability of /, are 
called the Cauchy-Riemann (C—R) conditions. 

The arguments leading to Equation (19.2) imply that the derivative, if it 
exists, can be expressed as 


df du .dv dv . du 
dz dx dx dy dy 


The C-R conditions assure us that these two equations are equivalent. 


(19.3) 
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Example 19.1.3. Let us examine the differentiability of some complex functions. 

(a) We have already established that f(z) = x + 2 iy is not differentiable at z = 0. 
We can now show that it is has no derivative at any point in the complex plane. This 
is easily seen by noting that u = x and v = 2 y, and that du/dx = 1 ^ dv/dy = 2, 
and the first C-R condition is not satisfied. The second C-R condition is satisfied, 
but that is not enough. 

(b) Now consider f(z) = x 2 — y 2 +2ixy for which u = x 2 ~y 2 and v = 2 xy. The C-R 
conditions become du/dx = 2x = dv/dy and du/dy = —2 y = —dv/dx. Thus, f(z) 
may be differentiable. Recall that C-R conditions are only necessary conditions; we 
do not know as yet if they are also sufficient. 

(c) Let u(x,y ) = cos y and v(x,y) = e x siny. Then du/dx = e x cosy = dv/dy 

and du/dy = —e x siny = —dv/dx and the C-R conditions are satisfied. ■ 


The requirement of differentiability is very restrictive: the derivative must 
exist along infinitely many paths. On the other hand, the C-R conditions 
seem deceptively mild: they are derived for only two paths. Nevertheless, 
the two paths are, in fact, true representatives of all paths; that is , the C-R 
conditions are not only necessary, but also sufficient. This is the content of 
the Cauchy-Riemann theorem which we state without proof: 2 


Theorem 19.1.4. ( Cauchy-Riemann Theorem ). The function f(z) = 
u(x,y) + iv(x,y) is differentiable in a region of the complex plane if and only 
if the Cauchy-Riemann conditions 


du dv du dv 

dx dy aU dy dx 


are satisfied and all first partial derivatives of u and v are continuous in that 
region. In that case 


df du 
dz dx 


. dv dv . du 
dx dy dy 


The C-R conditions readily lead to 


d 2 u d 2 u d 2 v d 2 v 

dx I + ~dy I= ’ dx I + W = ° 


(19.4) 


i.e., both real and imaginary parts of an analytic function satisfy the two- 
dimensional Laplace equation [Equations (15.13) and (15.15)]. Such functions 
are called harmonic functions. 


Example 19.1.5. Let us consider some examples of derivatives of complex func¬ 
tions. 


(a) f{z) = z. 


Here u = x and v = y, the C-R conditions are easily shown to hold, and for 

2 For a simple proof, see Hassani, S. Mathematical Physics: A Modern Introduction to 
Its Foundations, Springer-Verlag, 1999, Chapter 9. 


harmonic 
functions defined 
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complex 

trigonometric 

functions 


complex 

hyperbolic 

functions 


any 2 , we have df /dz = du/dx + idv/dx = 1. Therefore, the derivative exists at all 
points of the complex plane, i.e., f(z ) = 2 is entire. 

(b) f(z) = 2 2 . 

Here u = x 2 — y 2 and v = 2xy\ the C-R conditions hold, and for all points 2 of the 
complex plane, we have df /dz = du/dx + idv/dx = 2x + i2y = 2 2 . Therefore, f(z) 
is differentiable at all points. So, f(z) = 2 2 is also entire. 

(c) f(z) = z n for n > 1. 

We can use mathematical induction and the fact that the product of two entire 

functions is an entire function to show that —( z n ) = nz n ~ 1 . 

dz 

(d) f(z) = a 0 + ai 2 H-|- an-iz"^ 1 + a n z n , 

where cu are arbitrary constants. That f(z) is entire follows directly from (c) and 
the fact that the sum of two entire functions is entire. 


(e) f ( 2 ) = e z . 

Here u(x, y ) = e x cos y and v(x,y) = e x sin y. Thus, du/dx = e x cos y = dv/dy and 
du/dy = — e x svay = —dv/dx and the C-R conditions are satisfied at every point 
(x, y) of the a;j/-plane. Furthermore, 


df _ du .dv 
dz dx 1 dx 


x - • x • x / 1 • • \ x iy x-\-iy z 

= e cos y + te sin y = e (cos y + 1 sin y) = e e =e = e 


and e z is entire as well. 

(f) f{z) = 1 / 2 . 

The derivative can be found to be f'(z) = — 1 / 2 2 which does not exist for 2 = 0. 
Thus, 2 = 0 is a singularity of f(z). However, any other point is a regular point of /. 


(g) f( z ) ~ l/siri 2 . 

This gives df/dz = — cos 2 /sin 2 2 . Thus, / has (infinitely many) singular points at 
2 = ±rwr for n = 0,1,2,.... g 


Example 19.1.5 shows that any polynomial in z, as well as the exponential 
function e z is entire. Therefore, any product and/or sum of polynomials and 
e z will also be entire. We can build other entire functions. For instance, 
e lz and e~ lz are entire functions; therefore, the complex trigonometric 
functions, defined by 


sin 2 = 



and 


cos 2 = 



(19.5) 


are also entire functions. Problem 19.7 shows that sin 2 and cos 2 have only 
real zeros. 

The complex hyperbolic functions can be defined similarly: 


e 2 - e -2 e 2 + e -2 

sinh 2 =--- and cosh 2 =---. (19.6) 

Although the sum and product of entire functions are entire, the ratio 
is not. For instance, if f(z) and g(z) are polynomials of degrees m and n, 
respectively, then for n > 0, the ratio f(z)/g(z) is not entire, because at the 
zeros of g{z )—which always exist—the derivative is not defined. 
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The functions u(x , y) and v(x, y ) of an analytic function have an interesting 
property which the following example investigates. 


Example 19.1.6. The family of curves u(x, y ) = constant is perpendicular to 
the family of curves v(x,y) = constant at each point of the complex plane where 
f(z) = u + iv is analytic. 

This can easily be seen by looking at the normal to the curves. The normal to 
the curve u(x,y ) = constant is simply V« = (du/dx,du/dy) (see Theorem 12.3.2). 
Similarly, the normal to the curve v(x,y) = constant is Vu = (dv/dx,dv/dy). 
Taking the dot product of these two normals, we obtain 


(Vu) • (Vu) = 


du dv 
dx dx 


+ 


du dv 
dy dy 


du ^ du\ du/du\_ 

dx V dy) dy \dx) 


by the C-R conditions. 


One can safely say that rigorous complex analysis was founded by a single man: 
Cauchy. Augustin-Louis Cauchy was one of the most influential French mathe¬ 
maticians of the nineteenth century. He began his career as a military engineer, but 
when his health broke down in 1813 he followed his natural inclination and devoted 
himself wholly to mathematics. 

In mathematical productivity Cauchy was surpassed only by Euler, and his col¬ 
lected works fill 27 fat volumes. He made substantial contributions to number theory 
and determinants; is considered to be the originator of the theory of finite groups; 
and did extensive work in astronomy, mechanics, optics, and the theory of elasticity. 

His greatest achievements, however, lay in the field of analysis. Together with his 
contemporaries Gauss and Abel, he was a pioneer in the rigorous treatment of limits, 
continuous functions, derivatives, integrals, and infinite series. Several of the basic 
tests for the convergence of series are associated with his name. He also provided the 
first existence proof for solutions of differential equations, gave the first proof of the 
convergence of a Taylor series, and was the first to feel the need for a careful study 
of the convergence behavior of Fourier series. However, his most important work 
was in the theory of functions of a complex variable, which in essence he created and 
which has continued to be one of the dominant branches of both pure and applied 
mathematics. In this field, Cauchy’s integral theorem and Cauchy’s integral formula 
are fundamental tools without which modern analysis could hardly exist. 

Unfortunately, his personality did not harmonize with the fruitful power of his 
mind. He was an arrogant royalist in politics and a self-righteous, preaching, pious 
believer in religion -all this in an age of republican skepticism—and most of his 
fellow scientists disliked him and considered him a smug hypocrite. It might be fairer 
to put first things first and describe him as a great mathematician who happened 
also to be a sincere but narrow-minded bigot. 


19.1.2 Integration of Complex Functions 

We have thus far discussed the derivative of a complex function. The concept 
of integration is even more important because, as we shall see later, derivatives 
can be written in terms of integrals. 


curves of constant 
u and v are 
perpendicular. 



Augustin-Louis 
Cauchy 1789-1857 
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complex integrals 
are 

path-dependent. 


The definite integral of a complex function is naively defined in analogy 
to that of a real function. However, a crucial difference exists: While in the 
real case, the limits of integration are real numbers and there is only one way 
to connect these two limits (along the real line), the limits of integration of 
a complex function are points in the complex plane and there are infinitely 
many ways to connect these two points. Thus, we speak of a definite integral 
of a complex function along a path. It follows that complex integrals are, in 
general, path-dependent. 


f(z) dz 


lim 

N —too 
Az^O 


N 

i =1 


(19.7) 


where A Zi is a small segment—situated at Zi —of the curve that connects the 
complex number ctq to the complex number «2 in the z-plane (see Figure 19.3). 
An immediate consequence of this equation is 


f(z) dz 


= lim 

iV—too 
Azi—>0 


N 


Y f{zi)Azi 


i =1 
N 


N 


< I™ l/(^) A -il 

A/ -KYI ‘ 


iV—too 
Azi—>0 i= 1 


rot.2 

= J im \ Az i\= / \f(z)\\dz\, (19.8) 

Jet , 


Azi —>0 i —1 


where we have used the triangle inequality as expressed in Equation (18.7). 

Since there are infinitely many ways of connecting aq to a 2 , there is no 
guarantee that Equation (19.7) has a unique value: It is possible to obtain 
different values for the integral of some functions for different paths. It may 
seem that we should avoid such functions and that they will have no use in 
physical applications. Quite to the contrary, most functions encountered, will 
not, in general, give the same result if we choose two completely arbitrary 
paths in the complex plane. In fact, it turns out that the only complex 
function that gives the same integral for any two arbitrary points connected 
by any two arbitrary paths is the constant function. Because of the importance 
of paths in complex integration, we need the following definition: 



Figure 19.3: One of the infinitely many paths connecting two complex points qi and 
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Box 19.1.2. A contour is a collection of connected smooth arcs. When 
the beginning point of the first arc coincides with the end point of the last 
one, the contour is said to be a simple closed contour (or just closed 
contour). 


We encountered path-dependent integrals when we tried to evaluate the 
line integral of a vector field in Chapter 14. The same argument for path- 
independence can be used to prove (see Problem 19.21) 

Theorem 19.1.7. ( Cauchy G our sat Theorem) . Let f(z) be analytic on 
a simple closed contour C and at all points inside C. Then 


c 


f(z) dz = 0 


Equivalently, J^ 2 f(z) dz is independent of the smooth path connecting a.\ and 
02 as long as the path lies entirely in the region of analyticity of f(z). 


Example 19.1.8. We consider a few examples of definite integrals. 

(a) Let us evaluate the integral I\ = f zdz where 71 is the straight line drawn 
from the origin to the point (1,2) (see Figure 19.4). Along such a line y = 2x and 
thus 71 (t) =t + 2it where 0 < t < 1 , and 3 


7i = f zdz= f (t + 2it) (dt + 2idt) = f (— 3tdt + 4itdt) = — | + 2i. 
J J 0 J 0 


For a different path 72 , along which y = 
and 



2 a; 2 , we get 72 (f) = t + 2it 2 where 0 < t < 1 , 
2 it 2 )(dt + 4 itdt) = — | + 2 i. 


Therefore, I\ = I[. This is what is expected from the Cauchy-Goursat theorem 
because the function f(z) = z is analytic on the two paths and in the region bounded 
by them. 



Figure 19.4: The three different paths of integration corresponding to the integrals I\, 
T[, I 2 , and I' 2 . 


3 We are using the parameterization x = 1. y = 2x = 2/ for the curve. 
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(b) Now let us consider I2 = f z 2 dz with 71 as in part (a). Substituting for z in 
terms of t, we obtain 

h = [ (t+ 2 it) 2 (dt + 2 idt) = (1 + 2 if f t 2 dt = - § i. 

J 71 J 0 

Next we compare I2 with I' 2 = f z 2 dz where 73 is as shown in Figure 19 . 4 . This 
path can be described by 


7 s(t) 


t for 0 < t < 1, 

1 + i(t — 1 ) for 1 < t < 3 . 


Therefore, 

I 2 = J t 2 dt + J [1 + i(t — l)] 2 (idt) = | — 4 — |i = — | i, 

which is identical to I2, once again because the function is analytic on 71 and 73 as 
well as in the region bounded by them. 

(c) An example of the case where equality for different paths is not attained is 
I3 = dz/z where 74 is the upper semicircle of unit radius, as shown in Figure 19 . 5 . 
A parametric equation for 74 can be given in terms of 6\ 


74 (6) = cos 9 + i sin 9 = e t8 => dz = ie x6 d 9 , 0 < 9 < n. 


Thus, we obtain 


On the other hand, 


h = 


r _l 

Jo eie 


ie B d 9 = in. 


Is = /, Z dz = / jio i(r,W = - in - 

^ Z J 2tt e 


Here the two integrals are not equal. From 74 and 74 we can construct a counter¬ 
clockwise simple closed contour C, along which the integral of f(z) = 1 /z becomes 
<f c dz/z = 73-/3 = 2«7t. That the integral is not zero is a consequence of the 
fact that 1/z is not analytic at all points of the region bounded by the closed 
contour C. g 



Figure 19.5: The two semicircular paths for calculating I3 and I3. 
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Figure 19.6: A contour of integration can be deformed into another contour. The 
second contour is usually taken to be a circle because of the ease of its corresponding 
integration, (a) shows the original contour, and (b) shows the two contours as well as 
the (shaded) region between them in which the function is analytic. 


The Cauchy-Goursat theorem applies to more complicated regions. When 
a region contains points at which f(z) is not analytic, those points can be 
avoided by redefining the region and the contour (Figure 19.6). Such a pro¬ 
cedure requires a convention regarding the direction of “motion” along the 
contour. This convention is important enough to be stated separately. 


convention for 
positive sense of 
integration around 
a closed contour 


Box 19.1.3. ( Convention ). When integrating along a closed contour, 
we agree to traverse the contour in such a way that the region enclosed 
by the contour lies to our left. An integration that follows this convention 
is called integration in the positive sense. Integration performed in the 
opposite direction acquires a minus sign. 


Suppose that we want to evaluate the integral § c f(z) dz where C is some 
contour in the complex plane [see Figure 19.6(a)]. Let T be another—usually 
simpler, say a circle—contour which is either entirely inside or entirely out¬ 
side C. Figure 19.6 illustrates the case where T is entirely inside C. We 
assume that T is such that f(z) does not have any singularity in the region 
between C and T. By connecting the two contours with a line as shown in 
Figure 19.6(b), we construct a composite closed contour consisting of C, T, 
and twice the line segment L , once in the positive directions and once in 
the negative. Within this composite contour, the function /(z) is analytic. 
Therefore, by the Cauchy-Goursat theorem, we have 

- /0) dz+ (f f(z) dz+ [ f(z) dz- f f(z) dz = 0. 

Jc Jr Jl Jl 

The negative sign for C is due to the convention above. It follows from this 
equation that the integral along C is the same as that along the circle T. This 
result can be interpreted by saying that 
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why analytic 
functions “remote 
sense" their values 
at distant points 


Box 19.1.4. We can always deform the contour of an integral in the com¬ 
plex plane into a simpler contour, as long as in the process of deformation 
we encounter no singularity of the function. 


19.1.3 Cauchy Integral Formula 

One extremely important consequence of the Cauchy-Goursat theorem, the 
centerpiece of complex analysis, is the Cauchy integral formula which we state 
without proof. 4 


Theorem 19.1.9. Let f(z) be analytic on and within a simple closed contour 
C integrated in the positive sense. Let zq be any interior point of C. Then 


o) 


-L i -IfeU. 

27 n J c z- z 0 


This is called the Cauchy integral formula (CIF). 


Example 19.1.10. We can use the CIF to evaluate the following integrals: 

z 2 dz T f (z 2 — 1) dz 


h = 


h = 

e z/2 dz 


cAz 2 + 3 ) 2 (z~i)’ " Jc 2 (z- |)(« 2 -4) 3 ’ 


h = 


Jc 3 (z — in)(z 2 — 20) 4 ’ 

where Ci, Ci, and C 3 are circles centered at the origin with radii n = |, r 2 = 1, 
and r 3 = 4, respectively. 

For Ji we note that f(z) = z 2 /{z 2 + 3) 2 is analytic within and on Ci, and zo = i 
lies in the interior of Ci. Thus, 

Similarly, f(z) = (z 2 — l)/(z 2 — 4) 3 for I 2 is analytic on and within C 2 , and zo = 
is an interior point of C 2 . Thus, the CIF gives 


J 2 = / = 2ni f& = 2ni 7T 

J Co Z O \ A 


1-1 


>c 2 - 5 

= W 2 /fr 2 


(i-4) 3 


32tt . 

r*. 


1125 


For the last integral, f(z) = e z ‘ /(z 2 — 20) , and the interior point is z 0 = in: 


h = 


f(z)dz „ . e 


—— = 2 nif(in) = 2 ni 


i'K/2 


2tt 


c 3 


(-7T 2 - 20) 4 (tt 2 + 20) 4 ' 


The CIF gives the value of an analytic function at every point inside a 
simple closed contour when it is given the value of the function only at points 
on the contour. It seems as though analytic functions have no freedom within 
a contour: They are not free to change inside a region once their value is 
fixed on the contour enclosing that region. There is an analogous situation in 

4 For a proof, see Hassani, S. Mathematical Physics: A Modem Introduction to Its Foun¬ 
dations, Springer-Verlag, 1999, Chapter 9. 
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certain areas of physics, for example, electrostatics: The specification of the 
potential at the boundaries, such as conductors, automatically determines it 
at any other point in the region of space bounded by the conductors. This is 
the content of the uniqueness theorem (to be discussed later in this book) used 
in electrostatic boundary-value problems. However, the electrostatic potential 
<1> is bound by another condition, Laplace’s equation; and the combination of 
Laplace’s equation and the boundary conditions furnishes the uniqueness of <f>. 

It seems, on the other hand, as though the mere specification of an analytic 
function on a contour, without any other condition, is sufficient to determine 
the function’s value at all points enclosed within that contour. This is not 
the case. An analytic function, by its very definition, satisfies another re¬ 
strictive condition: Its real and imaginary parts separately satisfy Laplace’s 
equation in two dimensions! [see Equation (19.4)]. Thus, it should come as 
no surprise that the value of an analytic function at a boundary (contour) 
determines the function at all points inside the boundary. 


19.1.4 Derivatives as Integrals 


The CIF is a very powerful tool for working with analytic functions. One 
of the applications of this formula is in evaluating the derivatives of such 
functions. It is convenient to change the dummy integration variable to £ and 
write the CIF as 


m 


i / /(0<£ 


2 th Jc £-z ’ 


(19.9) 


where C is a simple closed contour in the £-plane and 2 is a point within C. 
By carrying the derivative inside the integral, we get 


I /(A# 

dz 2 tt i dz J c — z 


1 

2ni 


LL 

17(0 #1 

fc dz 

L J 


1 I M)dt ; 


2?n Jc (£ - z) 2 ' 


By repeated differentiation, we can generalize this formula to the nth deriva¬ 
tive, and obtain 


Theorem 19.1.11. The derivatives of all orders of an analytic function f(z) 
exist in the domain of analyticity of the function and are themselves analytic 
in that domain. The nth derivative of f(z) is given by 


1 U dz n 2m 


c 


(£- z) n+1 ' 


(19.10) 


Example 19.1.12. Let us apply Equation (19.10) directly to some simple func¬ 
tions. In all cases, we will assume that the contour is a circle of radius r centered 
at z. 

(a) Let f(z) = K, a constant. Then, for n = 1 we have 

df _ l r Kdj 
dz 2ni J c (£ - z) 2 ' 
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Since £ is always on the circle C centered at z, £ — z = re * 8 and = rie l9 dd. So 


we have 

df_ _ J_ f 2n Kire ie dO _ 
dz 2ni J Q (re* 8 ) 2 ! 

That is, the derivative of a constant is zero, 
(b) Given f(z) = z, its first derivative will be 


JL 

1 

l ^ 

dz 

2ivi , 

fc K - z) 2 


_ ±( 

'£ r e -* 8 


2n \ 

,rj 0 6 


Kire ie dd K f 2 * _ i6 jn n 

—— „ =- / e do = 0. 

(re* 8 ) 2 2nr J 0 


(z + re* e )ire* e d6 
(re* 8 ) 2 


■f-K 


m — (0 + 2tt) = 1. 


(c) Given f(z) = z 2 , for the first derivative Equation (19.10) yields 


dz 2m J c (^ - z) 2 

-if [**-<■ 


f 2n (z + re ie ) 2 ire ie dd 
o (re i8 ) 2 


r “2iT p ^ 

/ z 2 + (re* 0 ) 2 + 2zre 9 {re e )~ 1 d6 

Jo I- -I 


-f- r 

2n \ r J o 


d(9 + r 


e iw d6> + 2« 


r-) 


It can be shown that, in general, ( d/dz)z m = mz m 1 . The proof is left as Problem 
19.24. ■ 

The CIF is a central formula in complex analysis. However, due to space 
limitations, we cannot explore its full capability here. Nevertheless, one of its 
applications is worth discussing at this point. Suppose that / is a bounded 
entire function and consider 

V = J_I /(Off 

dz 2ni J c (£ - z) 2 ' 

Since / is analytic everywhere in the complex plane, the closed contour C can 
be chosen to be a very large circle of radius R with center at z. Taking the 
absolute value of both sides yields 


2tt J 0 


1 f (z + Re w ) 
(Re* 9 ) 2 
\f(z + Re ie )\ 
R 


iRe ie dd 


1 f 2n M M 

dd < — / — d6 = —, 

— n _ / r> T) 


where we used Equation (19.8) and \e ,0 \ = 1. M is the maximum of the 
function in the complex plane. 5 Now as R —> oo, the derivative goes to zero. 
The only function whose derivative is zero is the constant function. Thus 


Box 19.1.5. A bounded entire function is necessarily a constant. 


5 M exists because / is assumed to be bounded. 
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There are many interesting and nontrivial real functions that are bounded and 
have derivatives (of all orders) on the entire real line. For instance, e~ x is such 
a function. No such freedom exists for complex analytic functions according 
to Box 19.1.5! Any nontrivial analytic function is either not bounded (goes 
to infinity somewhere on the complex plane) or not entire [it is not analytic 
at some point(s) of the complex plane]. 

A consequence of Proposition 19.1.5 is the fundamental theorem of 
algebra which states that any polynomial of degree n > 1 has n roots (some 
of which may be repeated). In other words, the polynomial 

p(x) = do + a\x + ■ ■ ■ + a n x n for n > 1 


any nontrivial 
function is either 
unbounded or not 
entire. 


fundamental 
theorem of algebra 
proved 


can be factored completely as p(x) = c(x — Z\)(x — zf) ... (a: — z n ) where c is 
a constant and the Zi are, in general, complex numbers. 

To see how Proposition 19.1.5 implies the fundamental theorem of algebra, 
we let f(z) = 1 /p(z) and assume the contrary, i.e., that p(z) is never zero for 
any (finite) z. Then f{z) is bounded and analytic for all z, and Proposition 
19.1.5 says that f(z ) is a constant. This is obviously wrong. Thus, there must 
be at least one z, say 2 = 21 , for which p(z) is zero. So, we can factor out 
(2 — 21 ) from p(z) and write p(z) = (2 — 21 ) 9 ( 2 ) where 9 ( 2 ) is of degree n — 1. 
Applying the above argument to 9 ( 2 ), we have p(z) = (2 — 21)(2 — Z 2 )r(z ) 
where r(z ) is of degree n — 2. Continuing in this way, we can factor p{z) into 
linear factors. The last polynomial will be a constant (a polynomial of degree 
zero) which we have denoted as c. 


19.2 Problems 

19.1. Show that f(z) = z 2 maps a line that makes an angle a with the 
real axis of the 2 -plane onto a line in the ic-plane which makes an angle 
2a with the real axis of the ic-plane. Hint: Use the trigonometric identity 
tan 2 a = 2 tan a /(1 — tan 2 a). 

19.2. Show that the function w = I /2 maps the straight line y = ^ in the 
2 -plane onto a circle in the tc-plane. 

19.3. (a) Using the chain rule, find df/dz* and df/dz in terms of partial 
derivatives with respect to x and y. 

(b) Evaluate df/dz* and df/dz assuming that the C-R conditions hold. 

19.4. (a) Show that, when 2 is represented by polar coordinates, the C-R 
conditions on a function f(z) are 

dU__ldV_ d/U_ dV 

dr r d6' d6 dr ’ 

where U and V are the real and imaginary parts of f(z) written in polar 
coordinates. 
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(b) Show that the derivative of / can be written as 

dz \ Or or) 


Hint: Start with the C-R conditions in Cartesian coordinates and apply the 
chain rule to them using x = rcos9 and y = rsin 0 . 


19.5. Prove the following identities for differentiation by finding the real 
and imaginary parts of the function— u(x,y) and v{x,y) —and differentiat¬ 
ing them: 


(a) l if + 9) 



= # + dff 

dz dz 

f'(z)g{z)-g'(z)f(z) 

W)Y 


<*>>-§»+/§■ 


where g(z) Y 0 . 


19.6. Show that d/dz(lnz) = 1 / 2 . Hint: Find u(x,y) and v(x,y) for In 2 
using the exponential representation of z, then differentiate them. 


19.7. Show that sin 2 and cos 2 have only real roots. Hint: Use definition of 
sine and cosine in terms of exponentials. 


19.8. Use mathematical induction and the product rule for differentiation to 

show that -j-(z n ) = nz n_1 . 
dz 

19.9. Use Equations (19.5) and (19.6), to establish the following identities: 


(a) Re(sin^) = sin a; coshy, 

(b) Re(cos^) = cos x coshy, 

(c) Re(sinh z) = sinh x cos y, 

(d) Re(cosh^) = cosh a; cosy, 

(e) | sin z\ 2 = sin 2 x + sinlr 2 y, 

(f) | sinh z | 2 = sinlr 2 x + sin 2 y, 


Im(sin z) = cos x sinh y. 
Im(cos z) = — sin x sinh y. 
Im(sinh z) = cosh x sin y. 
Im(cosh z) = sinh x sin y. 

| cos z\ 2 = cos 2 x + sinlr 2 y. 

| cosh z\ 2 = sinh 2 x + cos 2 y. 


19.10. Find all the zeros of sinh 2 and cosh z. 


19.11. Verify the following trigonometric identities: 

(a) cos 2 2 + sin 2 z = 1. 

(b) cos(2i + z-i) = cos Z\ cos 22 — sin z\ sin 22 . 

(c) sin( 2 i + 22 ) = sin 21 cos 22 + cos z\ sin 22 . 

(d) sin ^ — 2 ^ = cos 2 , cos — z^ = sin 2 . 

(e) cos 22 = cos 2 2 — sin 2 2 , sin 22 = 2 sin 2 cos 2 . 
tan 21 + tan 22 
1 — tan 21 tan 22 


(f) tan( 2 i + 22 ) 
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19.12. Verify the following hyperbolic identities: 

(a) cosh 2 z — sinh 2 z = 1 . 

(b) cosh(^i + z 2 ) = cosh z± cosh z 2 + sinh z\ sinh z 2 . 

(c) sinh {z\ + Z 2 ) = sin z\ cosh z 2 + cosh z\ sinh Z 2 ■ 

(d) cosh 2z = cosh 2 z + sinh 2 2 , sinh 2z = 2 sinh 2 cosh 2 . 
tanh z\ + tanh 22 


(e) tanh (21 + z 2 ) = 


1 + tanh 21 tanh z 2 


19.13. Show that 


(a) tanh (j) 


sinh x + i sin y 


cosh x + cos y 
19.14. Prove the following identities: 


(b) coth (j) 


sinh x — i sin y 
cosh x — cos y 


(a) cos 1 2 = —*ln (2 ± \J z 2 — 1 ). (b) sin 1 2 = — iln [*2 ± y /1 — z 2 )\. 


(c) tan 1 2 = — In 
v ; 2 i 


1 — z 
i + z 


(d)cosh 1 2 = ln (2 ± \/z 2 — 1 ). 


(e)sinh 1 2 = ln (2 ± \/z 2 + 1 ). (f) tanh 1 2 = \ In . 

19.15. Prove that exp( 2 *) is not analytic anywhere. 

19.16. Show that e lz = cos 2 + i sin 2 for any 2 . 

19.17. Show that both the real and imaginary parts of an analytic function 
are harmonic. 

19.18. Show that each of the following functions—call each one u{x : y) — 
is harmonic, and find the function’s harmonic partner, v(x,y), such that 
u(x,y) + iv{x,y) is analytic. Hint: Use C-R conditions. 


(a) x 3 — 3 xy 2 . (b) e x cos y. 


(c) 


where x 2 + y 2 yf 0 . 


x 2 + y 2 

(d) e _ 2 y cos 2 cc. (e) e v ~ x cos 2 xy. 

(f) e x (x cos y — y sin y) + 2 sinh y sin x + x 3 — 3xy 2 + y. 


19.19. Describe the curve defined by each of the following equations: 


(a) 2 = 1 — it, 0 < t < 2. (b) 2 = t + it 2 , —00 <t< 00 . 

(c) 2 = a( cos t + i sin t) ^ < t < . (d ) z = t + j — 00 <t <0. 


19.20. Let /( 2 ) 


< 9 2 $ 

= w = u + iv. Suppose that -7—- + 

ox 2 


a 2 $ 

dy 2 


0. Show that if / 


d 2 <& 9“d> 

is analytic, then + = 0. That is, analytic functions map harmonic 

au 2 av 2 

functions in the 2 -plane to harmonic functions in the te-plane. 
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19.21. (a) Show that f f(z ) dz can be written as 


A • dr 


B • dr, 


where A = (tt, — t;,0), B = (v,u,0), and dr = (dx,dy, 0). 

(b) Show that both A and B have vanishing curls when / is analytic. 

(c) Now use the Stokes’ theorem to prove the Cauchy-Goursat theorem. 

19.22. Find the value of the integral J c [(z + 2 )/z\ dz, where C is: (a) the 
semicircle 2 = 2e l9 , for 0 < 9 < tt; (b) the semicircle 2 = 2e lS , for n < 6 < 2n\ 
and (c) the circle z = 2e ld , for — 7r < 9 < tt. 

19.23. Evaluate the integral f dz/(z — 1 — i) where 7 is: (a) the line joining 
Z\ = 2i and Z 2 = 3; and (b) the path from z\ to the origin and from there to 
- 22 - 

19.24. Use Equation (19.10) to show that —(z m ) = mz m ~ 1 . Hint: Use the 

dz 

binomial theorem. 


19.25. Let C be the boundary of a square whose sides lie along the lines 
x = ±3 and y = ±3. For the positive sense of integration, evaluate each of 
the following integrals by using CIF or the derivative formula (19.10): 


(a) 

(d) 

(g) 

(j) 

(m) 


c z — i7r/2 
sinh 2 


dz. 


dz. 


C 


z ’ 


(b) 

(e) 


c z ( z2 + 10 ) 
cosh z 


■ dz. 


c 


dz. (c) j) 

(f) 


cos z 


c(z- !)(2 2 -10) 


dz. 


cos z 


dz. 


cos z 


c i z - *tt/2) 2 

dz. (k) 


dz. (h) <f e ~ dz. (i) ^dz 
Jc \ z - m Y Jc z + m 


c( z ~ * 7r ) 2 
sinh 2 

c ( z - in/2) 2 


dz. (1) j) 


cosh 2 

C ( 2 - tt/ 2 ) 2 


dz. 


C ( 2 - 2 )( 2 2 - 10 ) 


dz. 




Chapter 20 

Complex Series 


As in the real case, representation of functions by infinite series of “simpler” 
functions is an endeavor worthy of our serious consideration. We start with 
an examination of the properties of sequences and series of complex numbers 
and derive series representations of some complex functions. Most of the 
discussion is a direct generalization of the results of the real series. 

A sequence {zk}^ =1 of complex numbers is said to converge to a limit z if sequence, 

Hindoo | z — Zk\ = 0. In other words, for each positive number e there must convergence to a 
exist an integer N such that \z ^ Zk\ < £ whenever k > N. The reader may bmit, partial sums, 
show that the real (imaginary) part of the limit of a sequence of complex an ^ series 
numbers is the limit of the real (imaginary) part of the sequence. Series can 
be converted into sequences by partial summation. For instance, to study 
the infinite series ^2kLi z ki we form the partial sums Z n = Y^k=i z k an d 
investigate the sequence {Z n }^Lp We thus say that the infinite series YlkLi z k 
converges to Z if limn^oo Z n = Z. 

Example 20.0.1. A series that is used often in analysis is the geometric series 
Z = X)fcLo zk • Let us show that this series converges to 1/(1 — z) for \z\ < 1. For a 
partial sum of n terms, we have 




516 


Complex Series 


absolute 

convergence 


power series 


If the series o Zk converges, both the real part, ^2(° =0 Xk, and the 
imaginary part, 2/fc> °f the series also converge. From Chapter 9, we 

know that a necessary condition for the convergence of the real series Xk 

and Vk is that Xk —»■ 0 and t/fc —> 0. Thus, a necessary condition for 

the convergence of the complex series is linifc_ >00 Zk = 0. The terms of such a 
series are, therefore, bounded. Thus, there exists a positive number M such 
that \zk < M for all k. 

A complex series is said to converge absolutely, if the real series 

oo oo / _ 

y\ zk \ = J2\/ x k+ y l 

fc=o fc=0 

converges. Clearly, absolute convergence implies convergence. 


20.1 Power Series 


We now concentrate on the power series which, as in the real case, are infinite 
sums of powers of (z — z o). It turns out—as we shall see shortly—that for 
complex functions, the inclusion of negative powers is crucial. 

Theorem 20.1.1. If the power series Y^h=o a k{ z ~ z o) k converges for z\ 
(assumed to be different from zq), then it converges absolutely for every value 
of z such that \z — zo | < \z± — zq | ■ Similarly if the power series bk/{z — 

zo) k converges for z-i yf z o, then it converges absolutely for every value of z 
such that \z — zq\ > \z2 — Zo\. 

Proof. We prove the first part of the proposition; the second part is done 
similarly. Since the series converges for z = z\, all the terms \affzi — zo) fc | 
are smaller than a positive number M. We, therefore have 


^2\a k (z - z 0 ) k \ 

k=0 


E 


k =0 


ak(zi - z 0 ) k 


(z - Z 0 ) k 
(zi - z 0 ) k 


\ak{zi - 2 :o) fc | 

fc =0 


Z- Zq 
Z 1 - Zq 


< y MB k 

k—0 


oo 

MyB k 

fc =0 


M 

1 -B 1 


where B = |(z — zq)/(z\ — zq)\ is a positive real number less than 1. Since 
the RHS is a finite (positive) number, the series of absolute values converges, 
and the proof is complete. □ 


The essence of Theorem 20.1.1 is that if a power series—with positive 
powers—converges for a point at a distance r\ from zo, then it converges for 
all interior points of a circle of radius r\ centered at zq. Similarly, if a power 
series—with negative powers—converges for a point at a distance r 2 from zo, 
then it converges for all exterior points of a circle of radius r 2 centered at zo 
(see Figure 20.1). 
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Figure 20.1: (a) Power series with positive exponents converge for the interior points 
of a circle, (b) Power series with negative exponents converge for the exterior points of 
a circle. 


Box 20.1.1. When constructing power scries, positive powers are used 
for points inside a circle and negative powers for points outside it. 


The largest circle about zq such that the first power series of Theorem 
20.1.1 converges is called the circle of convergence of the power series. It 
follows from Theorem 20.1.1 that the series cannot converge at any point 
outside the circle of convergence. (Why?) 

Let us consider the power series 


S(z) = ^2a k (z - z 0 ) k (20.1) 

k =0 

which we assume to be convergent at all points interior to a circle for which 
| z — zo\ = r. This implies that the sequence of partial sums {S n (;z)}^l 0 
converges. Therefore, for any e > 0, there exists an integer N e such that 

|S(z) — £ n (,z)| < £ whenever n > N e . 

In general, the integer N e may be dependent on z; that is, for different values 
of z , we may be forced to pick different Nfs. When N e is independent of z, we 
say that the convergence is uniform. We state the following result without 
proof: 

Theorem 20.1.2. The power series S(z) = a n (z — Zo) n is uniformly 

convergent for all points within its circle of convergence, and S(z) is an ana¬ 
lytic function of z there. Furthermore, such a series can be differentiated and 
integrated term by term: 

dS ^ = na «( z “ ^o)"" 1 , f S(z) dz = ^2a n j (z — z 0 ) n dz, 

n =1 1 n —0 ^ 


circle of 
convergence 


uniform 

convergence 

explained 

a power series is 
uniformly 
convergent and 
analytic; it can be 
differentiated and 
integrated term by 
term. 
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at each point z and each path 7 located inside the circle of convergence of the 
power series. 

By substituting the reciprocal of (z — Zq) in the power series, we can show 
that if Y^kLo bk/{z— zo) k is convergent in the annulus r2 < \z — zq\ < 77, then 
it is uniformly convergent for all 2 in that annulus, and the series represents 
a continuous function of z there. 


20.2 Taylor and Laurent Series 

Complex series, just as their real counterparts, find their most frequent utility 
in representing well-behaved functions. The following theorem, which we state 
without proof, 1 is essential in the application of complex analysis. 

Theorem 20.2.1. Let C\ and C2 be circles of radii 77 and r2, both centered 
at z 0 in the z-plane with 77 > 77. Let f(z) be analytic on C\ and C2 and 
throughout S, the annular region between the two circles. Then, at each point 
z of S, f(z) is given uniquely by the Laurent series 

00 ' 1 r f(f) 

/(*)= Y a n {z-z 0 ) n , where a n = —j^ <%, 

and C is any contour within S that encircles zo- When 77 = 0, the series is 
called Taylor series. In that case a n = 0 for negative n and a n = f^ n \zo)/n\ 
for n > 0. 

We can see the reduction of the Laurent series to Taylor series as follows. 
The Laurent expansion is convergent as long as 77 < \z — zo\ < 77. In partic¬ 
ular, if 77 = 0, and if the function is analytic throughout the interior of the 
larger circle, then /(£)/(£ — Zo) n+1 will be analytic for negative integer n, and 
the integral will be zero by the Cauchy-Goursat theorem. Therefore, a n will 
be zero for n = —1, —2 ,.... Thus, only positive powers of (z — Zo) will be 
present in the series, and we obtain the Taylor series. 

Maclaurin series For zo = 0, the Taylor series reduces to the Maclaurin series: 

00 /•(ra)frO 

m = /( 0 )+ no)z+-- = y 

n\ 

71=0 

Box 19.1.4 tells us that we can enlarge Ci and shrink C2 until we encounter 
a point at which / is no longer analytic. Thus, we can include all the possible 
analytic points by enlarging C\ and shrinking C7. 

Example 20.2.2. Let us expand some functions in terms of series. For entire 
functions there is no point in the entire complex plane at which they are not analytic. 

1 For a proof, see Hassani, S. Mathematical Physics: A Modem Introduction to Its Foun¬ 
dations, Springer-Verlag, 1999, Section 9.6. 
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Thus, only positive powers of (z — zo) will be present, and we will have a Taylor 
expansion that is valid for all values of z. 

(a) We expand e z around z o = 0. The nth derivative of e z is e z . Thus, /^(0) = 1, 
and the Taylor (Maclaurin) expansion gives 

r » v / (n) (0) ,n 

z—* n! n! 

71=0 71=0 

(b) The Maclaurin series for sin z is obtained by noting that 


d n 
dz" 


0 if n is even, 

(— 1) C ’ T -— !)/ 2 n j g 


and substituting this in the Maclaurin expansion: 

71 00 

s(r»-l)/2Z 


= E(- 1 ) (n_1)/2 ^ = E(- 1 ) 


k z 


2fc+l 


Similarly, we can obtain 


cos z — E(-!) 

k =0 


k % 


(2 *)!’ 


sinh z = y 


k =o 


2E. z 2k+1 


(2k + 1)! 


t 0 ( 2 k+i y- 


cosh z = E 


oo ou 

z 2k 


k =0 


(2fc)!' 


It is seen that the series representation of all these functions is obtained by replacing 
the real variable x in their real series representation with a complex variable z. 

(c) The function l/(l+z) is not entire, so the region of its convergence is limited. Let 
us find the Maclaurin expansion of this function. Starting from the origin (zo = 0), 
the function is analytic within all circles of radii r < 1. At r = 1 we encounter a 
singularity, the point z = — 1. Thus, the series converges for all points z for which 
|z| < 1." For such points we have 


Thus, 


/ (n) (°)=^T [(! + *) 


= (—l)”n!. 


y> / (n) (0) ..n 


1 + z n\ 

71=0 


E*- 1 )’ 


The Taylor and Laurent series allow us to express an analytic function as 
a power series. For a Taylor series of f(z) the expansion is routine because 
the coefficient of its nth term is simply f < ' n ^(zo)/n\, where zo is the center of 
the circle of convergence. However, when a Laurent series is applicable in a 
given region of the complex plane, the nth coefficient is not, in general, easy to 
evaluate. Usually it can be found by inspection and certain manipulations of 
other known series. Then the uniqueness of Laurent series expansion assures 
us that the series so obtained is the unique Laurent series for the function in 
that region. 3 

2 As remarked before, the series diverges for all points outside the circle \z\ = 1. This 
does not mean that the function cannot be represented by a series for points outside the 
circle. On the contrary, we shall see shortly that the Laurent series, with negative powers 
is designed precisely for such a purpose. 

3 See Hassani, S. Mathematical Physics: A Modem Introduction to Its Foundations , 
Springer-Verlag, 1999, p. 258. 


there is only one 
Laurent series for 
a given function 
defined in a given 
region. 
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we can add, 
subtract, and 
multiply 

convergent power 
series. 


As in the case of real series, 


Box 20.2.1. We can add, subtract, and multiply convergent power series. 
Furthermore, if the denominator does not vanish in a neighborhood of a 
point z o, then we can obtain the Laurent series of the ratio of two power 
series about Zq by long division. 


Thus converging power series can be manipulated as though they were 
finite sums (polynomials). Such manipulations are extremely useful when 
dealing with Taylor and Laurent expansions in which the straightforward cal¬ 
culation of coefficients may be tedious. The following examples illustrate the 
power of infinite-series arithmetic. In these examples, the following equations 
are very useful: 


1 


1-z 


OO 


E s 

71=0 


1 

1 + z 


E(-i)"* n > i*i <L 

71=0 


( 20 . 2 ) 


Example 20.2.3. To expand the function f(z) = 
about z = 0, rewrite it as 


in a Laurent series 



— — -x (3 — 1 -\- z — z + 2 3 — • • •) — — y -I - 1 -\- z — z +•••. 

Z z z 


This series converges for 0 < \z\ < 1. We note that negative powers of 2 are also 
present. This is a reflection of the fact that the function is not analytic inside the 
entire circle \z\ = 1; it diverges at 2 = 0. _ 


Example 20.2.4. The function f(z) = z/[{z — 1)(2 — 2)] has a Taylor expansion 
around the origin for \z\ < 1. To find this expansion, we write 4 

, = _ 1_ = _l _ 1 _ 

HZ> 2 - 12-2 1-2 1 - 2 / 2 ' 

Expanding both fractions in geometric series (both \z\ and \z/2\ are less than 1), we 
obtain f(z) = z n — Y2™=o( z /2) n ■ Adding the two series yields 


/W = E(l- 2 “ T, ) 2 n for 1 2 1 < 1. 

n =0 

This is the unique Taylor expansion of f(z) within the circle \z\ = 1. 


4 We could, of course, evaluate the derivatives of all orders of the function at z = 0 and 
use the Maclaurin formula. However, the present method gives the same result much more 
quickly. 
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For the annular region 1 < \z\ < 2 we have a Laurent series. This can be seen 
by noting that 

> = 1/z _ 1 _1 f 1 \_ 1_ 

J[Z> l/z-l l-z/2 z\l-l/z) 1 - 2 / 2 ' 

Since both fractions on the RHS are analytic in the annular region (| 1/2| < 1, 
\z/2\ < 1), we get 



— OO OO OO 

\ ^ n \ A o — n n \ ^ n 

= - 2^ z ~2^ 2 z = - 2 ^ a ™ z > 
n= — 1 n =0 n= — oo 


where a n = — 1 for n < 0 and a n = —2 _n for n > 0. This is the unique Laurent 
expansion of f(z) in the given region. 

Finally, for \z\ > 2 we have only negative powers of 2 . We obtain the expansion 
in this region by rewriting f{z) as follows: 


/(*) = 


V* , 2/2 

1 - 1/2 1 - 2 / 2 ' 


Expanding the fractions yields 


f(z) = -J2 2 “ n_1 + I] 2 ™ +1 2 -n-1 = ^(2 n+1 - 1 ) 2 -' 

71=0 71=0 71=0 


This is again the unique expansion of f(z) in the region 1 2 | 


> 2. 


The example above shows that a single function may have different series 
representations in different regions of the complex plane, each series having 
its own region of convergence. 

Example 20.2.5. Define f(z) as 

f(l-cos 2 )/ 2 2 for z/O, 

/0) Hi , n 

I i for 2 = 0. 

We can show that f(z) is an entire function. 

Since 1 — cos 2 and z 2 are entire functions, their ratio is analytic everywhere 
except at the zeros of its denominator. The only such zero is 2 = 0. Thus, f(z) is 
analytic everywhere except possibly at 2 = 0. To see the behavior of f(z) at 2 = 0, 
we look at its Maclaurin series: 


1 — cos 2 = 1 — Y,{-iy 


(2 n)\ 


which implies that 


1 00 2 n — 2 , 2 4 

1 — C0S2 1 2 2 

—— — 

71=1 V ' 

The expansion on the RHS shows that the value of the series is i, which, by defini¬ 
tion, is /(0). Thus, the series converges for all 2 , and Box 20.1.2 says that f(z) is 
entire. ■ 
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A Laurent series can give information about the integral of a function 
around a closed contour in whose interior the function may not be analytic. 
In fact, the coefficient of the first negative power in a Laurent series is given by 

= (20.3) 


Thus, 


Box 20.2.2. To find, the integral of a (nonanalytic) function around a 
closed contour surrounding Zo, write the Laurent series for the function 
and read off a_i, the coefficient of the 1 /{z — z<f) term. The integral is 
2iria-i. 


Example 20.2.6. As an illustration of this idea, let us evaluate the integral I = 
f c dz/[z 2 (z — 2)], where C is a circle of radius 1 centered at the origin. The function 
is analytic in the annular region 0 < \z\ < 2. We can, therefore, expand it as a 
Laurent series about z = 0 in that region: 



Thus, a_i = — i, and f c dz/\z 2 (z — 2)] = 27rm_i = —m/2. Any other way of 
evaluating the integral is nontrivial. g 


20.3 Problems 


20.1. Expand sinhz in a Taylor series about the point z = in. 


20.2. Let C be the circle \z — i\ = 3 integrated in the positive sense. Find 
the value of each of the following integrals using the CIF or the derivative 
formula (19.10): 


(a) 





sinh z 
(z^ + n 2 ) 2 


dz. 


(e) 


cosh z ; 
(z 2 + 7r 2 ) 3 dZ • 



dz 

z 2 +9 
z 2 - 3z + 4 
z 2 — 4z + 3 


dz. 


20.3. For 0 < r < 1, show that 


OO 

r k cos kd = 

fc=o 


1 — r cos 9 


and 


OO 

r k sin kd = 

fe=o 


r sin 9 

1 + r 2 — 2r cos 9 


1 + r 2 — 2r cos 9 
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20.4. Find the Taylor expansion of 1/z 2 for points inside the circle \z — 2| < 2. 

20.5. Use mathematical induction to show that 


3F< 1 + ^' 


= (- 1 )’ 


2=0 


20.6. Find the (unique) Laurent expansion of each of the following functions 
in each of its regions of analyticity: 


(a) 

(e) 

(i) 


1 


(z- 2)(z-3)' 

1 


( l -*) 3 

z 


(f) 


z 2 -l 



(d) 

z 2 — 4 

(h) 

(g) z 2 — 9 


sinh z — z 
? ' 
1 

(z 2 — l) 2 ' 


2-1 


20.7. Show that the following functions are entire: 

(a) f(z) = 


for 2 ^ 0 , 


sinz f / o 
(b) f(z) = ) z ^ ’ 


for 2 = 0. (1 

C 0 S 2 for 2 ± 7 t/ 2 , 


for 2 = 0. 


(C) /(2) = ^ - * / 4 

— 1 / 7T 


for 2 = ±7t/2. 


20.8. Obtain the first few nonzero terms of the Laurent-series expansion of 
each of the following functions about the origin by approximating the denomi¬ 
nator by a polynomial and using the technique of long division of polynomials. 
Also find the integral of the function along a small simple closed contour en¬ 
circling the origin. 


( a ) - 

(e) 


1 


sin 2 
1 


e z -1 


(b) 

(f) 


1 


1 — COS 2 

1 


(c) 

(g) 


1 — cosh z 

^4 


(d) 


z — sm z 


6z + z 3 — 6 sinh z 


20.9. Obtain the Laurent-series expansion of f(z) = sinh z/z 3 about the 
origin. 




Chapter 21 

Calculus of Residues 


One of the most powerful tools made available by complex analysis is the 
theory of residues, which makes possible the routine evaluation of certain real 
definite integrals that are impossible to calculate otherwise. Example 20.2.6 
showed a situation in which an integral was related to expansion coefficients 
of Laurent series. Here we will develop a systematic way of evaluating both 
real and complex integrals using the same idea. 

Recall that a singular point zq of f(z) is a point at which / fails to be 
analytic. If, in addition, there is some neighborhood of Zq in which / is 
analytic at every point (except, of course, at zo itself), then zq is called an 
isolated singularity of /. All singularities we have encountered so far have isolated singularity 
been isolated singularities. Although singularities that are not isolated also 
exist, we shall not discuss them in this book. 

21.1 The Residue 

Let z o be an isolated singularity of /. Then there exists an r > 0 such that, 
within the “annular” region 0 < \z — zq\ < r, the function / has the Laurent 
expansion 1 


f(z) = a "(z - Z o) n = a n{z - ZqY 

where 


bi 


n——oo 


n —0 


'-Zq (z- Zq) 2 


“ d b "=hl f ™ 

In particular, 




c 


1 We are using b n for a_ n . 


( 21 . 1 ) 
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residue defined 


where C is any simple closed contour around Zq, traversed in the positive 
sense, on and interior to which / is analytic except at the point zq itself. 


Box 21.1.1. The complex number b\, which is times the integral of 
f{z) along the contour, is called the residue of f at the isolated singular 
point Zq. 


It is important to note that the residue is independent of the contour C as 
long as zo is the only isolated singular point within C. 

Example 21.1.1. We want to evaluate the integral f c sinzdz/(z — 7r/2) 3 where 
C is any simple closed contour having z = 7r/2 as an interior point. 

To evaluate the integral we expand around z = 7 t/ 2 and use Equation (21.1). 
We note that 


sin z == cos z — 


OO 


fz-n/2) 2 
(2 n)\ 


= 1 - 


(z - n/2f 


+ • 


so 

sinz _ 1 _ 1 / 1 \ 

(z — 7r/2) 3 (z — 7t/2) 3 2\z — n/2j^~ 

It follows that 6i = — 1; therefore, f c sin zdz/(z — n/2) 3 = 27ri6i = —in. M 

Example 21.1.2. The integral f c coszdz/z 2 , where C is the circle |z| = 1, is 
zero because 

cosz _ 1 , n z 2n _ 1 1 z 2 

z 3 ~~ i 3 ' (2nV. - i 3 "" 2 + ¥ + "' 

yields bi = 0 (no 1/z term in the Laurent expansion). Therefore, by Equation (21.1) 
the integral must vanish. 

When C is the circle |z| = 2, j> c e z dzj(z — l) 3 = «7re because 


n =0 


(z-ir 


1 + (Z - 1 ) + + • 


and 


Thus, b i 


+ 


1 | 1 

(I^TF + 2 


(z-1) 3 [(z-1) 3 

e/2, and the integral is 2nibi = ine. 




We use the notation Res[/(zo)] to denote the residue of / at the isolated 
singular point zq. Equation (21.1) can then be written as 


j) f{z)dz = 2ni Res[/(z 0 )] • 

What if there are several isolated singular points within the simple closed 
contour Cl Let Ck be the positively traversed circle around shown in 
Figure 21.1. Then the Cauchy-Goursat theorem yields 

0 = (f f(z) dz = (£ f(z) dz + (f f(z) dz + (f f(z) dz, 

JC J circles ./parallel J c 

lines 
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Figure 21.1: Singularities are avoided by going around them. 


where C' is the union of all contours inside which union there are no singu¬ 
larities. The contributions of the parallel lines cancel out, and we obtain 

P m r. m 

f f(z ) dz = - V f f(z) dz = Y] 2iri Res[/(> fe )], 

J C k= 1 k =1 

where in the last step the definition of residue at Zk has been used. The minus 
sign disappears in the final result because the sense of Ck, while positive for 
the shaded region of Figure 21.1, is negative for the interior of Ck because 
this interior is to our right as we traverse Ck in the direction indicated. We 
thus have 


Theorem 21.1.3. ( The Residue Theorem). Let C be a positively inte¬ 
grated simple closed contour within and on which a function f is analytic 
except at a finite number of isolated singular points zi, Z 2 , ■ ■ ■, z m interior to 
C. Then 


1 f(z) dz = 27riy^Res[/(zfc)]. 

fc=l 


( 21 . 2 ) 


Example 21.1.4. Let us evaluate the integral f c (2z — 3 )dz/[z(z — 1)] where C 
is the circle \z\ = 2. There are two isolated singularities in C, z\ = 0 and 22 = 1. 
To find Res[/( 2 i)], we expand around the origin using Equation (20.2): 


22 - 3 3 1 3 1 3 

-pr —- - —-h -- —-h 1 + 2 + 

2(2—1) 2 2—1 2 1—2 2 


for \z\ < 1. 


This gives Res[/( 2 i)] = 3. Similarly, expanding around 2 = 1 gives 


22-3 _ 3 _ i_ 

2(2 — 1 ) (2 — 1 ) -b 1 2—1 



+ 3£(-l)"(2-l) n 

71=0 


which yields Res[/( 22 )] = —1. Thus, 
22-3 


c z(z-l) 


dz = 2-7ri{Res[/(2i)] + Res[/( 22 )]} = 27ri(3 — 1) = 4ni. 
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principal part of a 
function 

removable singular 
point 


poles defined 


simple pole 


Let f(z) have an isolated singularity at zo- Then there exist a real number 
r > 0 and an annular region 0 < \z — Zq\ < r such that f can be represented 
by the Laurent series 


/(z) = an ( z - z °) n + 

n —0 n=1 


(z - z 0 ) n 


(21.3) 


The second sum in Equation (21.3), involving negative powers of (z — zq), is 
called the principal part of / at zq. The principal part is used to classify 
isolated singularities. We consider two cases: 

(a) If b n = 0 for all n > 1, Zo is called a removable singular point of/. 
In this case, the Laurent series contains only nonnegative powers of (z — Zo), 
and setting /(zo) = ao makes the function analytic at zo- For example, the 
function /(z) = (e 2 — 1 — z)/z 2 , which is indeterminate at z = 0, becomes 
entire if we set /(0) = 1/2, because its Laurent series 


/(*) = 




has no negative power. 

(b) If b n = 0 for all n > m and b m yf 0, Zo is called a pole of order m. In 
this case, the expansion takes the form 

oo I j 

m = ' 0o) ” + ^ + ' ■' + (z -lr 

for 0 < \z — zo\ < r. In particular, if m = 1, Zq is called a simple pole. 

Example 21.1.5. Let us consider some examples of poles of various orders. 

(a) The function (z 2 — 3z + 5)/(z — 1) has a Laurent series around z — 1 containing 
only three terms: (z 2 — 3z + 5)/(z — 1) = —1 + (z — 1) + 3/(z — 1). Thus, it has a 
simple pole at z = 1, with a residue of 3 . 

(b) The function sin z/ 2 6 has a Laurent series 

sinz 1 ^ , n z 2n+1 11 1 

z 6 z 6 2 L/ (2 n+ 1)1 z 5 6z 3 ^ (5!)z 7! 

n=0 

about z = 0. The principal part has three terms. The pole, at z = 0, is of order 5, 
and the function has a residue of 1/120 at z = 0. 

(c) The function (z 2 — 5z + 6)/(z — 2) has a removable singularity at z = 2, because 

g2 ;y 6 = (z -y 2 - 3) =z- 3 = -i + (z-2) 

and 6„ = 0 for all n. g 


The type of isolated singularity that is most important in applications is 
of the second type—poles. For a function that has a pole of order m at zo, 
the calculation of residues is routine. Such a calculation, in turn, enables us 
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to evaluate many integrals effortlessly. How do we calculate the residue of a 
function / having a pole of order m at z§l 

It is clear that if / has a pole of order to, then g(z ) defined by g(z ) = 
(z — Zo) m f(z ) is analytic at zq. Thus, for any simple closed contour C that 
contains Zq but no other singular point of /, we have 


Res[/(z 0 )] 


1 

2iri 


f(z) dz 


1 f g(z) dz 
2ni J c (z - z 0 ) m 


g^-'Xzp) 

(m — 1)! 


where we used Equation (19.10). In terms of / this yields 2 


1 rf 1-1 

Res[/( 2 0 )] = --— lim [(z - z 0 ) m f(z)]. (21.4) 

(to — 1)! z^zo az m 1 

For the special, but important, case of a simple pole, we obtain 

Res[/(-o)] = lim [{z - z 0 )f(z)]. (21.5) 

>Zq 


The most widespread application of residues occurs in the evaluation of 
real definite integrals. It is possible to “complexify” certain real definite in¬ 
tegrals and relate them to contour integrations in the complex plane. What 
is typically involved is the addition of a number of semicircles to the real 
integral such that it becomes a closed contour integral whose value can be 
determined by the residue theorem. One then takes the limit of the contour 
integral when the radii of the semicircles go to infinity or zero. In this limit 
the contributions from the semicircles should vanish for the method to work. 
In that case, one recovers the real integral. There are three types of integrals 
most commonly encountered. We discuss these separately below. In all cases 
we assume that the contribution of the semicircles will vanish in the limit. 


application of the 
residue theorem in 
evaluating definite 
integrals 


21.2 Integrals of Rational Functions 


The first type of integral we can evaluate using the residue theorem is of the 
form 


h 



p{x) 

q{x) 


dx, 


where p(x) and q(x) are real polynomials, and q(x) yf 0 for any real x. We 
can then write 


h 


, im f Rp M dx = 

fl^oo J_ R q{x ) 



P(z) 

q{z) 


dz, 


where C x is the (open) contour lying on the real axis from — R to +R. We 
now close that contour by adding to it the semicircle of radius R [see Fig¬ 
ure 21.2(a)]. This will not affect the value of the integral because, by our 

2 The limit is taken because in many cases the mere substitution of zq may result in an 
indeterminate form. 
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(a) 


(b) 


Figure 21.2: (a) The large semicircle is chosen in the UHP. (b) Note how the direction 
of contour integration is forced to be clockwise when the semicircle is chosen in the 
LHP. 


assumption, the contribution of the integral of the semicircle tends to zero in 
the limit R —> oo. We close the contour in the upper half-plane (UHP) if q(z) 
has a zero there. We then get 


Ii = lim (f ~t~t dz = 2ni Res 
it-oo J c q(z) ^ 

where C is the closed contour composed of the interval (-R, R) and the 
semicircle Cr, and {zj} k - =1 are the zeros of q{z) in the UHP. We may instead 
close the contour in the lower half-plane (LHP), in which case 


P&j) 


h 


—27 ri ^ Res 

3=1 


' p( z iY 

A z j). 


where {zj}™_ 1 are the zeros of q(z) in the LHP. The minus sign indicates that 
in the LHP we (are forced to) integrate in the negative sense. 

Example 21.2.1. Let us evaluate the integral I = x 2 dx/[(x 2 + l)(x 2 + 9)]. 
Since the integrand is even, we can extend the interval of integration to all real 
numbers (and divide the result by 2). It is shown below that in the limit that the 
radius of the semicircle goes to infinity, the integral of that semicircle goes to zero. 
Therefore, we write the contour integral corresponding to I: 

1 f z 2 dz 

~ 2 fc (z 2 + 1)(2 2 + 9)’ 

where C is as shown in Figure 21.2(a). Note that the contour is integrated in the 
positive sense. This is always true for the UHP. The singularities of the function 
in the UHP are the simple poles i and 3 i corresponding to the simple zeros of the 
denominator. By (21.5), the residues at these poles are 
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Res[/(i)] = lim 


Res[/(3*)] = lim 

z — >3i 


(z-i)-, 


(2 — i)(z + *)( 2 2 + 9) J 
(z — 3 *) 


Thus, we obtain 


I = 


f 

Jo 


x 2 dx 


1 


(z 2 + 1 ){z — 3 i){z + 3*) 
z 2 dz 


1 

16?’' 

3 

“ 16*' 


1 3 

= « -TTT + T^T 
16* 16* 


(x 2 + l)(* 2 + 9) 2 J c (z 2 + 1)(2 2 + 9) 

It is instructive to obtain the same results using the LHP. In this case the contour 
is as shown in Figure 21.2(b). It is clear that the interior is to our right as we traverse 
the contour. So we have to introduce a minus sign for its integration. The singular 
points are at 2 = —* and 2 = —3*. These are simple poles at which the residues of 
the function are 


Res[/(—*)] = lim 


(2 + *) 


(2 — *) (2 + i)(z 2 + 9) 


1 

16?’ 


Res[/(—3*)] = lim 

z —» — 3i 


Therefore, 
I = 


f 


x 2 dx 


(2 + 3*) 


1 


(z 2 + 1) (2 — 3*) (2 + 3*) 


3 

16*' 


(x 2 + 1)(* 2 + 9) 2 J c (z 2 + 1) (z 2 + 9) 


2 2 dz 


. ( _1_3_^ = *r 

1 \ 16* 16* 


We now show that the integral of the large circle T tends to zero. On such a 
circle, 2 = Re ie : therefore 


L 


z dz 


-L 


R 2 e 2ie Re i0 d6 


J r (z 2 T 1) (z 2 + 9) J r (R 2 e 2ie + 1) {R 2 e 2W + 9)' 

In the limit that R —> 00 , we can ignore the small numbers 1 and 9 in the denom¬ 
inator. Then the overall integral becomes 1/R times a finite integral over 6. It 
follows that as R tends to infinity, the contribution of the large circle indeed goes to 
zero. ® 

Example 21.2.2. Let us now consider a more complicated integral: 

x 2 dx 


/ 


-00 (z 2 + 1)(* 2 +4) 2 

which turns into <f c z 2 dz/[{z 2 + 1)(2 2 + 4) 2 ]. The poles in the LTHP are at 2 = * and 
2 = 2*. The former is a simple pole, and the latter is a pole of order 2. Thus, 


Res[/(*)] = lim 

z — 

Res[/(2*)] = 


( 2 -*) 


(2 — *) (2 + i)(z 2 + 4) 2 


1 

18?’ 


lim 


d 

(2-1)! z—* 2 i dz 
d 


= lim 

z — > 2 z dz 


(2 — 2 iy 
.2 


and 


£ 


(z 2 + 1)(2 + 2*) 2 
x 2 dx 


(z 2 + l)(z + 2i) 2 (z-2i) 2 
5 

= 72*’ 


= 271-* -T7T2 T 72772 ) — TJ7. • 


18* 72* 


36 


l-oo (x 2 + l)(x 2 + 4) 2 
Closing the contour in the LHP would yield the same result as the reader is urged 
to verify. ® 
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21.3 Products of Rational and Trigonometric 
Functions 


The second type of integral we can evaluate using the residue theorem 
the form 



P(x) , 

—— cos ax ax 
q(x) 


or 



p(x) . , 

—— sm ax ax, 

q{x) 


is of 


where a is a real number, p(x) and q(x) are real polynomials in x, and q{x) 
has no real zeros. These integrals are the real and imaginary parts of 


h 



P ( X ) iax 
q{x) 


dx. 


The presence of e lax dictates the choice of the half-plane: If a > 0, we choose 
the UHP because 


e i az = e ia(x+iy) = e iax e -ay where y > 


and the negative exponent ensures convergence for large R and y. For the same 
reason, we choose the LHP when a < 0. The following examples illustrate the 
procedure. 

Example 21.3.1. Let us evaluate cosaxdx/(x 2 + l) 2 where a ^ 0. This 
integral is the real part of the integral I 2 = e xax dx/(x 2 + l) 2 . When a > 0, we 
close in the UHP. Then we proceed as for integrals of rational functions. Thus, we 
have 

f e iaz 

I2 = f -r~2 -TW = 27riRes[/(«)] for a > 0, 

Jc \ z + 1 ) 

because there is only one singularity in the UHP at z = i which is a pole of order 2. 
We next calculate the residue: 


Res [/(*)] = lim 

■2 >1 az 

= lim — 
z^i az 


(«-i) 2 


(z — i) 2 (z + i ) 2 


(.z + i ) 2 


= lim 


(z + i)iae iaz - 26" 


(z + i) 3 


4 i 


■(1 + a). 


Substituting this in the expression for / 2 , we obtain I 2 = ( n/2)e~ a (l + a) for a > 0. 

When a < 0, we have to close the contour in the LHP, where the pole of order 
2 is at z = —i and the contour is taken clockwise. Thus, we get 


r e i- az 

h = f c (-. 2 + 1 )2 dz = -27r*Res[/(—*)] 
For the residue we obtain 


for a < 0. 


Res [/(—*)] 


lim — 

z->-i dz 


(z + i) 


and the expression for I 2 becomes I 2 = 
the two results and write 


(2 — i) 2 (z + i) 
(n/2)e a (l - a) 


e a 

for a < 0. We can combine 



cos ax 

0x 2 + l) 2 


dx = Re(/ 2 ) = I 2 = ^(1 + |a|)e 
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Example 21.3.2. As another example, let us evaluate 

x sin ax 


/ 


-oo x 4 + 4 


dx where a ^ 0. 


This is the imaginary part of the integral I 2 = f^° xe xax dx/(x A + 4) which, in terms 
of ^ and for the closed contour in the UHP (when a > 0), becomes 


h = 


■ dz = 27ri Res if( z 
3=1 


for a > 0, 


( 21 . 6 ) 


where C is the large semicircle in the UHP. The singularities are determined by the 
zeros of the denominator: 2 4 + 4 = 0 or 2 = 1 ± i, — 1 ± i. Of these four simple poles 
only two, 1 + i and — 1 + i, are in the UHP. We now calculate the residues: 

ze iaz 

Res[/(l + i)l= lim (z-l—i)-, --- 7-7 --- 7-7 -:- 77 -;-tt 

z-i+i (z - 1 - i)(z - 1 + i){z + 1 - i)(z + 1 + i) 

(1 + i)e ia(1+i) e ia e~ a 


(2i)(2)(2 + 2i) 8 i ’ 

Res[/(—1 + i)] = lim (z + 1 — i) 


z —► — \-\-i 


(z + 1 — i)(z + 1 + i)(z — 1 — i)(z — 1 + i) 


(-1 + i)e ia( ~ 1+i) _ e~ ia e~ a 


(2*)(—2)(—2 + 2i) 8 i 

Substituting in Equation (21.6), we obtain 


Thus, 


t ^ / ia —ia\ • ^ —a • 

i 2 = 27vi——(e —e ) = %—e sin a. 
82 2 


f 00 x sin ax T tt _ a . 

/ — -— dx = Im(i 2 j = — e sin a tor a > 0. 

J —oo ^ + 4 2 


(21.7) 


For a < 0, we could close the contour in the LHP. But there is an easier way of 
getting to the answer. We note that —a > 0, and Equation (21.7) yields 


/: 


x 4 + 4 


■ dx = — 


j: 


a;sin[(— a)x] 


j 7T —( — a) • / \ 7T a . 

. A ■ dx = ——e K y sin - a) = — e sin a. 

x 4 +4 2 v ’ 2 


We can collect the two cases in 


/: 


X Sill ax , 7T _| a | 

— - -— dx = —e 1 1 i 

x 4 + 4 2 


Example 21.3.3. The integral dx occurs frequently in physics. To eval¬ 

uate it, first we assume that a > 0 and note that since the integrand is even, we can 
extend the lower limit of integration to — oo and write 

f°° sin a* 

Jo x 

As in the previous examples, we are inclined to choose the contour C in the UHP. 
However, since C passes through the origin, this will not work because the origin is 
the pole of the integrand. So, let’s avoid the origin by going around it on a small 


dx = 


/: 


■ dx. 
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Figure 21.3: To avoid the origin move on an infinitesimal semicircle 7 ,= of radius e. 


circle of radius e as shown in Figure 21.3. This contour does not surround a pole. 
Therefore, we can write 


0 = 


r Aaz r — e Aax r Aaz re 

I -— dz= [ -— dx+ [ -— dz+ [ 

Jc Z j- oo * C 2 J e 


-dx 


As e —> 0, the two integrals in x become a single integral over all real numbers. 
Thus, we get 


But on 7 e , z = ee . Thus 

f e iaz f 

lirn / - dz = lint / 


/ oo g tax r g' iaz 

- dx = — lirn / - dz 

-oo X 2 

,° 

i lirn / 
Lr 

J TV 


0 iaee x 

^ • iG j/) • i 

-zee civ = i \ 


dO = —in 


and 


/: 


-dx = in. 


Putting everything together, we obtain 


f 


sin ax . 1 

- dx = — 

x 2 


/: 


sin ax , 1 T 

- dx = — 1 m 

x 2 


/: 


0 iax -i 

-—dx = — Im(z 7 r) = — 

x 2 ' ' 2 


If a < 0, then sin a* = — sin \a\x and we get the negative of the answer above. 


21.4 Functions of Trigonometric Functions 

The third type of integral we can evaluate using the residue theorem involves 
only trigonometric functions and is typically of the form 

2tt 

-F(sin 0 ,cos 0 ) dd , 

where F is some (typically rational) function 3 of its arguments. Since 9 varies 
from 0 to 2-7T, we can consider it as the angle of a point z on the unit circle 

3 Recall that a rational function is, by definition, the ratio of two polynomials. 
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centered at the origin. Then z = e l6 and e~ l9 = 1/z, and we can substitute 
cos 6 = (z + l/z)/2, sin# = (z — l/z)/(2i), and dO = dz/{iz) in the original 
integral to obtain 

'z — 1/z z + l/z\ dz 
2i ’ 2 ) iz 

This integral can often be evaluated using the method of residues. 

Example 21.4.1. Let us evaluate the integral [~ n d0/( 1 + acos 9) where |a| < 1. 
Substituting for cos 9 and d9 in terms of z, we obtain 

dz/iz 2 f dz 


c 1 + a[{z 2 + l)/2z] i J c 2z + az 2 + a' 

where C is the unit circle centered at the origin. The singularities of the integrand 
are the zeros of its denominator 2 z + az 2 + a = a(z — zi)(z — 22 ) with 


zi = 


-1 + Vl-a 2 


and 


2 2 = 


-1 - 


For |a| < 1 it is clear that 22 will lie outside the unit circle C; therefore, it does not 
contribute to the integral. But 21 lies inside, and we obtain 


dz 


= 2ni Res[/( 2 i)]. 


T c 2 2 + az 2 + a 
The residue of the simple pole at 21 can be calculated: 

1 


Res[/( 2 i)] = lim (2 — 21 )- 


1 


:(z - 21 ) (2 - 22 ) 


a \ 21 — 22 


a \ 2 


2V1 - a 2 


It follows that 


/ 


dO 


dz 


2 

— T o — 27TZ 

1 + a cos 6 1 J c 2z + az 2 + a i 


2 v / l _r l 


2?r 


\/1 — i 


Example 21.4.2. As another example, let us consider the integral 

dd 


I = 


p 71 

Jo 


(a + cos 9) 2 

Since cos 9 is an even function of 9 , we may write 


where a > 1. 


I = 


/ 


d9 


2 (a + cos 6 1 ) 2 


where a > 1 . 


This integration is over a complete cycle around the origin, and we can make the 
usual substitution: 

_ 1 / dz/iz _ 2 / 2 d 2 

2 J c [a + (z 2 + l)/2 2 ] 2 i J c (z 2 + 2 a 2 + l) 2 ' 

The denominator has the roots 2 i = —a + \Ja 2 — 1 and 22 = —a — \/a 2 — 1 which 
are both of order 2. The second root is outside the unit circle because a > 1. The 
reader may verify that for all a > 1, 21 is inside the unit circle. Since 21 is a pole of 
order 2, we have 
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Res[/(zi)] = lim — 

z—*z\ dz 


(z-z i) 


(z - zi) 2 (z - z 2 ) 2 


= lim — 

z^Z! dz 


{z - z 2 ) 2 


2zi 


(si — Z 2) 2 (zi — Z 2) 3 4 (a 2 — I ) 3 / 2 ' 


We thus obtain 


J = j2m Res[/(zi)] = (ffl2 ™ )3/2 ■ 


21.5 Problems 

21 . 1 . Evaluate each of the following integrals, for all of which C is the circle 
\z\ = 3: 


( \ As ^2 ^ 

(a) <p —-— dz 


(d) 

(g) 


c z(z-2) 
z 2 + 1 
c z ( z ~ 1) 
sinh z 


(b) 

dz. (e) j) 


c z(z - in) 
cosh z 


dz. 


~2 1 _2 

C 2 + n 


dz. 


dz. 


c 


(j) j) tan zdz. 

(m) (f — dz. 
J c z 2 sin 3 


(h) j) z cos ) dz. 
(k) </> . , „ dz. 


(c) 

(f) 

(i) 


cos 2 
'c z ( z - ?r) 

1 — COS z 
^2 

c z 
dz 


c 


3 (2 + 5) 


dz. 

dz. 

dz. 


( n ) 


c sinh 22 

e* dz 


(!) f^dz. 


T c (2 1) (2 2)* 

21.2. Find the residue of f(z) = 1/cos 2 at all its poles. 

21.3. Evaluate the integral / 0 °° dx/[(x 2 + l){x 2 + 4)] by closing the contour 
(a) in the UHP and (b) in the LHP. 

21.4. Evaluate the following integrals in which a and b are nonzero real con¬ 


stants: 

,, r 2x 2 + 1 , 

(a) i 0 x 4 + 5x 2 + 6 


(d) 

(g) 

(j) 

(m) / 
(P) 


cos x dx 


o (x 2 + a 2 ) 2 (x 2 + b 2 ) 

r°° dx 

0 (x 2 + l) 2 (x 2 + 2) 

r °° xdx 


(b) 

(e) 


dx 


6x 4 + 5x 2 + 1 


(c) 


dx 


x 4 + 1 


cos ax 


1 0 (x 2 + & 2 ) 2 

r°° 2r 2 - 1 

(h) / ZFTT dx - 


dx. (f) f 

Jo 

(i) 


dx 


-00 ( x 2 + 4x + 13) 2 ' 
00 x cos x dx 


(k) 

( n ) 

(q) 


-**■ (1) io ^ 


0 (x 2 + l) 2 ' 

r°° x 2 dx 

0 (x 2 + a 2 ) 2 ' 

l '°°x 2 + l , 

• dx. 


x sin x dx 


00 

00 


cos ax 


dx. 


(o) 


(r) 


dx 


dx 


0 (x 2 + 4) 2 (x 2 + 25)' J o x 2 + b 2 ' w 7 0 (x 2 + 4) 2 ' 

21.5. Evaluate each of the following integrals by turning them into contour 
integrals around the unit circle. 
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(>2tt 


(a) 


(c) 


(e) 


d.0 


r>2i r 


/ 0 


>0 


5 + 4 sin 9 

2 * de 
1 + sin 2 9 
r 2 * cos 2 39 


I o 5 — 4 cos 2 
(g) 


(b) 


(d) 


d0 


>o 


a + cos 0 
2,r d9 


dJ9. 


(f) 

cos 2 3 


0 (a + &cos 2 0) 2 
d(j> 


(b) 


1 — 2a cos <j) + a 2 
cos 2 (j)d(j) 

1 0 1 — 2a cos (j> + a 2 


1 — 2a cos 4> + a 2 
where a ^ ±1. 


where a > 1. 

where a,b > 0. 
where a ^ ±1. 


where a / ±1. 


21 . 6 . Use the method of residues to show that 


cos 2 ” 9 d9 = n 


(2 n)l 
2 2n (n!) 2 


21.7. Use the contour in Figure 21.4(a) to show that 


sin x 


dx = 


by letting X —> oo, Y —> oo, and e —> 0. 

21 . 8 . Use the contour in Figure 21.4(b) to show that 

,IY ’ 1 , 7r/n 


1 + x” 


■ dx = 


sin(7r /ri) 


by letting R —> oo. 

21.9. Use the contour in Figure 21.4(c) to show that 


nOO nOO 

/ sin(x 2 ) dx = / cos(x 2 ) dx 

Jo Jo 


by letting R —> oo. 





Figure 21.4: (a) The contour used for sinx/x. (b) The contour used for 1/(1 + a:™), 
(c) The contour used for sin(a: 2 ). 




Part VI 

Differential Equations 



Chapter 22 

From PDEs to ODEs 


Physics, as the most exact science, is characterized by its ability to make 
mathematical predictions. Predictions are based on two factors: the initial 
information (data), and the law governing the physical process. Knowing 
what the situation is here and now (initial data, initial conditions, boundary 
conditions) enables physics to predict what the situation will be there and 
then. This ability to predict is based on the intuitive belief that physical 
quantities, dependent on continuous parameters such as position and time, 
must be continuous functions of those parameters. Thus, knowledge of the 
values of those functions at one (initial) point and of how the functions change 
from one point to a neighboring point (given by the laws of physics) allows 
the values of the functions at the neighboring point to be predicted. Once 
the values of the functions are determined at the new point, their values can 
be predicted for its neighboring points, and the process can continue until a 
distant point is reached. 

In mechanics, for example, knowledge of the force acting on a particle of 
mass m, located at ro and moving with momentum p 0 at time to, allows its 
momentum and position at a later time to + At to be predicted as follows. 
Because dp/dt = F by Newton’s second law of motion, we have 

Ap « F(r 0 , p 0 , to)At 

and 

p(f 0 + At) = p 0 + Ap « p 0 + F(r 0 , p 0 , t 0 )At. 

Similarly, 

r(t 0 + At) ss r 0 + v 0 t « r 0 + —At. 

m 

The smaller At is, the better the prediction will be. 

Newton’s second law of motion, 

s( m S) =F(r ’* M '* ) 


initial conditions 
are needed to 
predict the 
evolution of a 
physical system. 
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ordinary 
differential 
equation (ODE) 


partial differential 
equations (PDEs) 


the meaning of 
boundary 
conditions (or 
BCs) elaborated 


Poisson equation 


Laplace’s equation 


is an example of an ordinary differential equation (ODE). A dependent 
variable r is determined from an equation involving a single independent vari¬ 
able t, the dependent variable r, and its various derivatives. 

In (point) particle mechanics there is only one independent variable, lead¬ 
ing to ODEs. In other areas of physics, however, in which extended objects 
such as fields are studied, variations with respect to position are also present. 
Partial derivatives with respect to coordinate variables show up in the differ¬ 
ential equations, which are therefore called partial differential equations 
(PDEs). For instance, in electrostatics, where time-independent scalar fields 
such as potentials, and vector fields such as electrostatic fields, are studied, 
the law is described by Poisson’s equation, V 2< I>(r) = —47rp(r), where $ is 
the electrostatic potential and p is the volume charge density. Other PDEs 
occurring in mathematical physics include the heat equation, describing the 
transfer of heat, the wave equation, describing the propagation of various 
kinds of wave, and the Schrodinger equation, describing nonrelativistic quan¬ 
tum mechanical phenomena. 

In fact, except for the laws of particle mechanics and electrical circuits, 
in which the only independent variable is time, almost all laws of physics are 
described by PDEs. We shall not study PDEs in their full generalities, but 
concentrate on the simplest ones encountered most frequently in ideal physical 
applications. The method of solution that works for all these equations is the 
separation of variables, whereby a PDE is turned into a number of ODEs. 

Before embarking on the separation of variables, we need to formalize the 
discussion above. An ordinary or a partial DE will provide a unique solution 
to a physical problem only if the initial or the starting value of the solution 
is known. We refer to this as the boundary conditions, or BCs for short. 
For ODEs, boundary conditions amount to the specification of one or more 
properties of the solution at an initial time; that is why for ODEs, one speaks 
of initial conditions. BCs for PDEs involve specification of the solution on 
a surface (or a curve, if the PDE has only two variables). 


22.1 Separation of Variables 

We list here the PDEs encountered in undergraduate courses and initiate 
their transformation into ODEs. Let us start with the simplest PDE arising 
in electrostatic problems, the Poisson equation, derived in Chapter 15, 

V 2( f>(r) = -47rp(r). (22.1) 

In vacuum, where p( r) = 0, Equation (22.1) reduces to Laplace’s equation, 

V 2< f>(r) = 0. (22.2) 

Many electrostatic problems involve conductors held at constant potentials 
and situated in a vacuum. In the space between such conducting surfaces, the 
electrostatic potential obeys Equation (22.2). 
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Next in complexity is the heat equation, whose most simplified version— 
the one studied here—is 


dT 

~8t 


fc 2 V 2 T(r, t), 


(22.3) 


where T is the temperature and k is a real constant characterizing the medium 
in which heat is flowing. 

Probably one of the most recurring PDEs encountered in mathematical 
physics is the wave equation, 


V 2 T 


1 <9 2 T 
c 2 dt 2 


= 0 . 


(22.4) 


This equation (or its simplification to lower dimensions) is applied to the 
vibration of strings and drums, the propagation of sound in gases, solids, and 
liquids, the propagation of disturbances in plasmas, and the propagation of 
electromagnetic waves. 

The Schrodinger equation, describing the nonrelativistic quantum phe¬ 
nomena, is 

h 2 o 9T 

V 2 T + Vr )y = -ih—, 22.5 

2m at 

where m is the mass of a subatomic particle, h is Planck’s constant (divided by 
27r), V is the potential energy of the particle, and |’P(r, t)| 2 is the probability 
density of finding the particle at r at time t. 

Equations (22.3)-(22.5) have partial derivatives with respect to time. As 
a first step toward solving these PDEs, let us separate the time variable. We 
will denote the functions in all four equations by the generic symbol 4'(r, t). 

The separation of variables starts with separating the r and t dependence 
into factors: 1 

*(r,t) = R(r)T(t). 

This factorization permits us to separate the two operations of space differ¬ 
entiation and time differentiation. As an illustration, we separate the time 
and space dependence for the Schrodinger equation. The other equations are 
done similarly. Substituting for T, we get 


-^V 2 (f?T) + V(r)(RT) = -ih^-(RT) 


2m 


df 


or 

-t|-V 2 .R + V(r)(RT) = -iRh 
2m at 

where we have used ordinary derivatives for T because, by assumption, it is 
a function of a single variable. Dividing both sides by RT yields 

1 Nol e that there is no a priori reason why the basic assumption underlying the separation 
of variables is legitimate. After all, we cannot write sin(xt) as a product, f(x)g(t). However, 
in all cases of physical interest the separation of variables works. 


heat equation 


wave equation 


Schrodinger 

equation 


time is separated 
from space 
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central argument 
in separation of 
variables 


+ = < 22 ' 6 > 

Now comes the crucial step in the process of the separation of variables. 
The LHS of Equation (22.6) is a function of position alone, and the RHS is a 
function of time alone. Since r and t are independent variables, the only way 
that (22.6) can hold is for both sides to be constant, say a: 


R 2m 


V 2 i? + V{v) = a 


h 2 2 

-—\/ 2 R + V(r)R = aR 
2m 


and 


1 dT 

- ,H Tlt= a 


dT ia 
dt h 


(22.7) 


We have reduced the original time-dependent Schrodinger equation, a 
PDE, to an ODE involving only time, and a PDE involving only the posi¬ 
tion variables. Most problems of elementary mathematical physics have the 
same property, i.e., they are completely equivalent to Equation (22.7) plus 
the equation before it, which we write generically as 


V 2 i? + f(r)R = 0, 


( 22 . 8 ) 


where we have simplified the notation by including a in the function /. 
The foregoing discussion is summarized in this statement: 


Box 22.1.1. The time-dependent PDEs of mathematical physics can be 
reduced to an ODE in the time variable and the PDE given in Equation 
(22.8). For those PDEs involving second time derivatives, such as the 
wave equation, (22.7) will be a second-order ODE. 


With the exception of Poisson’s equation, in all the foregoing equations 
the term on the RHS is zero. We will restrict ourselves to this so-called 
homogeneous case 2 and rewrite (22.8) as 

V 2 T(r) +/(r)T(r) = 0. (22.9) 

The rest of this section is devoted to the study of this equation in various 
coordinate systems. 


22.2 Separation in Cartesian Coordinates 


In Cartesian coordinates, Equation (22.9) becomes 


d 2 q> d 2 T 
dx 2 ^ dy 2 


9 2 T 

-qjt + f( x >y^ z )^ = °- 


2 The most elegant way of solving inhomogeneous PDEs is the method of Green’s func¬ 
tions, of which we shall have a brief discussion in Chapter 29. For a thorough discussion 
of Green’s functions, see Hassani, S. Mathematical Physics: A Modern Introduction to Its 
Foundations , Springer-Verlag, 1999, Part VI. 
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As in the case of the separation of the time variable, we assume that we can 
separate the dependence on various coordinates and write 

y(x,y,z) = X(x)Y(y)Z(z). 


Then the above PDE yields 
,d 2 X 


d 2 Y d 2 Z 

YZ '—^dr + XZ— T + XY —2 + f(x, y, z)XYZ = 0. 
dy dz 


dx 2 

Dividing by XYZ gives 

1 d 2 X 
X dx 2 


1 d 2 Y 1 d 2 Z . n 


( 22 . 10 ) 


This equation is almost separated. The first term is a function of x alone, the 
second of y alone, and the third of z alone. However, the last term, in general, 
mixes the coordinates. The only way the separation can become complete is 
for the last term to be separated as well, that is, expressed as a sum of three 
functions, each depending on a single coordinate. 3 In such a special case we 
obtain 

1 d 2 X 1 d 2 Y 1 d 2 Z 

fi(x) + h{y) +h{z) = 0 


or 


X dx 2 

1 d 2 X 

Id? + 


y dy 2 


AO) 


•Z dz 2 
1 d 2 Y 


hiy) 


1 d 2 Z 


■h{z) 


= 0. 


L y dy 2 

The first term on the LHS depends on x alone, the second on y alone, and 
the third on z alone. Since the sum of these three terms is a constant (zero), 
independent of all variables, each term must be a constant. Denoting the 
constant corresponding to the ith term by —a,, we obtain 


1 d 2 X 


+ fi(x) = ~a i, 


1 d 2 Y 


X dx 2 J v ' 1 Y dy 2 

which can be reexpressed as 

+ + a i\ X = 0, 


+ h{y) = -oi 2 , 


1 d 2 Z 
Z dz 2 


+ h(z) = -013, 


dx 2 

d 2 Z 

dz 2 


d 2 Y 

W 


[f 3 (z) + a 3 \Z = 0, 


[/2(y) + a 2 \ Y = 0, 


Oi 1 + 0.2 + 03 — 0. 


( 22 . 11 ) 


If f(x, y, z) happens to be a constant C, then the first three terms of Equation 
(22.10) can be taken to be respectively —a 3 , —a 2 , and — 03 , leading to 


d 2 X 

dx 2 


T oc\X — 0, 


d 2 Z 

dz 


2 + 0:3!? — 0, 


d 2 Y 

2 + ° l iY — 0 , 

dy 2 

Ol\ + OL 2 + a 3 — C. 


( 22 . 12 ) 


3 This is where the limitation of the method of the separation of variables becomes 
evident. However, surprisingly, all physical applications, at our level of treatment, involve 
functions that are indeed separated. 
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Laplace’s equation 


These equations constitute the most general set of ODEs resulting from the 
separation of the PDE of Equation (22.9) in Cartesian coordinates. 


Example 22.2.1. Let us consider a few cases for which (22.11) or (22.12) is 
applicable. 

(a) In electrostatics, separation of Laplace’s equation, for which /(r) = 0, leads to 
these ODEs: 

d 2 X v „ d 2 Y „ „ d 2 Z , \ ry r\ 

- 2 1 “ OL\X — 0 , - 2 — 1 “ — 0 , - 2~ — (<^1 Y OC2)Z — 0 . 

dx dy dz 

The solutions to these equations are trigonometric or hyperbolic (exponential) func¬ 
tions, determined from the boundary conditions (conducting surfaces). The unsym- 
metrical treatment of the three coordinates—the plus sign in front of the first two 
constants and a minus sign in front of the third—is not dictated by the above equa¬ 
tions. There is a freedom in the choice of sign in these equations. However, the 
boundary conditions will force the constants to adapt to values appropriate to the 
physical situation at hand. 

(b) In quantum mechanics the time-independent Schrodinger equation for a free 
particle in three dimensions is 


_,2 t 2 mE 
V-'L-h = 0. 

nr 

Separation of variables yields the ODEs of Equation (22.12) with 

2 mE 


ttl + Q2 + Q!3 = 


h 2 ' 


After time is separated, the heat and wave equations also yield equations similar to 

( 22 . 12 ). 

(c) In quantum mechanics the time-independent Schrodinger equation for a three 
dimensional isotropic harmonic oscillator is 


V 2 \t — 


m 2 u 2 2 2 mE\ 

r~ — 


h 2 


h 2 I 


= 0. 


Thus, 


m 2 ui 2 2 , 2 mE 
/(r) =- —r- + 


h 2 h 2 

Equation (22.11) then yields 


m 2 u> 2 . 2 , 2 . 2 \ . 2 mE 

-{x +y + z ) + 


K 2 


h 2 ' 


l2 \r 2 2 

d A m u> 2 _ 

-2- to - X X + OllX — 0, 

dx h 2 

d 2 Y m 2 io 2 2 

+<*2* =0, 

j2 ry 2 2 

d Z mu 2y . 

~i~2 - To - Z Z + OL^Z — 0 , 

dz 2 h 2 


with ai + «2 + «3 = 2 mE/h 2 . 
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22.3 Separation in Cylindrical Coordinates 

Equation (22.9) takes the following form in cylindrical coordinates: 4 


ld_ 

pdp 


% 


1 d 2 'k <9 2 T 


P 2 dp 


+ -rpr + = 0 . 


To separate the variables, we write T(p, p, z) = R(p)S(p)Z(z), substitute in 
the general equation, and divide both sides by RSZ to obtain 


lid 


dR 


1 1 d z S 


1 d 2 Z 


C 72 ~r ~2 + ^7 ~TT + /(di Vb z) ~ 0- 


R p dp v dp ) ' S p 2 dp 2 ' Z dz 

We shall consider only the special (but important) case in which f{p , p, z) 
is a constant A. In that case, the equation becomes 


+A = 0. 


'lid 

(p-W 

1 

_|_ 

'1 d 2 S~ 

_ 1 _ 

' 1 d 2 Z' 

R p dp 

V dp)_ 

' 2 
p 2 

Sdp I _ 

1 

Z dz 2 _ 


function of p and p only 


fn. of z 


The sum of the first two terms is independent of z, so the third term must be 
as well. We thus get 




1 d 2 Z 



Z dz 2 ~ 1 

'll d 

fn dR \ 

1 /I d 2 S 

R p dp 

V dp) \ 

+ p 2 \S dp 2 


and 


Multiplying this equation by p 2 yields 


+ Ai + A — 0. 




R dp \ dp J 



function of p only 

Since the first term is a function of p only and the second a function of p only, 
both terms must be constants whose sum vanishes. Thus, 


i<rs 

S dp‘ 


= Mi 


(P~J~ ) + (Ai + A)p 2 + n = 0. 


Rdp \ dp 


(22.13) 


Putting together all of the above, we conclude that when Equation (22.9) 
is separable in cylindrical coordinates and /(r) = A, it will separate into the 
following three ODEs: 


d_ 

dp 


d 2 Z 

dz 2 

dR 

dp 


- Ai Z = 0, 

(Ai + A )p 


<fs 

dp 

V 

p 


2 — pS = o, 


R = 0, 


(22.14) 


4 See Chapter 16 for the expression of V 2 in spherical and cylindrical coordinate systems. 
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Bessel differential 
equation 



Jean Le Rond 
d’Alembert 
1717-1783 


where in rewriting the second equation in (22.13), we multiplied both sides of 
the equation by R and divided it by p. The last equation of (22.14) is called the 
Bessel differential equation. This equation shows up in electrostatic and 
heat-transfer problems with cylindrical geometry and in problems involving 
two-dimensional wave propagation, as in drumheads. 


Jean Le Rond d’Alembert was the illegitimate son of a famous salon host¬ 
ess of eighteenth-century Paris and a cavalry officer. Abandoned by his mother, 
d’Alembert was raised by a foster family and later educated by the arrangement of 
his father at a nearby church-sponsored school, in which he received instruction in 
the classics and above-average instruction in mathematics. After studying law and 
medicine, he finally chose to pursue a career in mathematics. In the 1740s he joined 
the ranks of the philosophes, a growing group of deistic and materialistic thinkers 
and writers who actively questioned the social and intellectual standards of the day. 
He traveled little (he left France only once, to visit the court of Frederick the Great), 
preferring instead the company of his friends in the salons, among whom he was well 
known for his wit and laughter. 

d’Alembert turned his mathematical and philosophical talents to many of the 
outstanding scientific problems of the day, with mixed success. Perhaps his most 
famous scientific work, entitled Traite de dynamique, shows his appreciation that 
a revolution was taking place in the science of mechanics—the formalization of 
the principles stated by Newton into a rigorous mathematical framework. Later, 
d’Alembert produced a treatise on fluid mechanics, a paper dealing with vibrating 
strings, and a skillful treatment of celestial mechanics. d’Alembert is also credited 
with the use of the first partial differential equation as well as the first solution to 
such an equation using separation of variables. 

Much of the work for which d’Alembert is remembered occurred outside math¬ 
ematical physics. He was chosen as the science editor of the Encyclopedie, and his 
lengthy Discours Preliminaire in that volume is considered one of the defining doc¬ 
uments of the Enlightenment. Other works included writings on law, religion, and 
music. 


22.4 Separation in Spherical Coordinates 


By far the most commonly used coordinate system in mathematical physics 
is the spherical coordinate system. This is because forces, potential energies, 
and most geometries encountered in Nature have a spherical symmetry. One 
of the consequences of this spherical symmetry is that the function /(r) is 
a function of r and not of angles. We shall assume this to be true in this 
subsection. 

In spherical coordinates, Equation (22.9) becomes [see Equation (16.19)] 


±JL ( r 

r 2 dr \ dr 


1 

r 2 sin 9 


d_ 

de 



1 d 2 T 

sin 6 dp 2 


+ f(r)V = Q. (22.15) 
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To separate this equation means to write ^f(r,6,<p) = R(r)Q(9)$(<p). If we 
substitute this in Equation (22.15) and note that each differentiation acts on 
only one of the three functions, we get 


04- 


]_d_ 

r 2 dr 


2 dR\ 
r ~ib) 


R 

r 2 


4- d 


sin 9 


dQ 

d9 


0 d 2 4- 


sin 2 9 dip 2 


sin 9 d9 

Now divide both sides by i?04> and multiply by r 2 to obtain 
, dR 


+ /(r)i?04> = 0. 


R dr 


dr 


r /(r) 


' 1 d 


1 d 2 $ 

| 

0 sin 9 d9 

v “ <w / 

1 4> sin 2 9 dp 2 


= 0. 


function of r alone 


function of 0 and ip only 


Since each one of the two terms is a function of different variables, each must 
be a constant; and the two constants must add up to zero. Therefore, we have 


^£( r2 ^)+ r2 /M = a, 


1 


0 sin 9 d9 


LA 

R dr 

. ,d© 

d9 


sm( 


_L_<i^4> 

4> sin 2 9 dp 2 


The second equation can be further separated. We add a to both sides and 
multiply the resulting equation by sin 2 9 to obtain 


sin# d 
~e~d9 


• n de 
‘"' e M 


a sin 2 9 


1 d 2 $ 


function of 6 alone; set = (3 



= o. 


We have thus obtained three ODEs in three variables. We rewrite these ODEs 
in the following equations: 


LA 

r 2 dr 


, dR 
dr 


f( r ) - 


a 


R = 0, 


1 d 
sin 9 d9 



P 


0 = 0 , 


sin 2 9 , 
d 2 4- „ 

d^ + ^~°- 


(22.16) radial, polar, and 
azimuthal 
equations 


The first equation is called the radial equation, the second the polar 
equation, and the third the azimuthal equation. The radial equation can 
be further simplified by making the substitution R = u/r. This gives 


d 2 u 

dr 2 




(22.17) 


Our task in this chapter was to separate the PDEs most frequently encoun¬ 
tered in undergraduate mathematical physics into ODEs; and we have done 
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this in the three coordinate systems regularly used in applications. We shall 
return to a thorough treatment of the ODEs so obtained later in the book, 
and in the process we shall be introduced to the so-called special functions 
that came into being in the nineteenth century as a result of the then newly 
discovered technique of the separation of variables. 

22.5 Problems 

22.1. Assume that two functions <f>i and <f >2 satisfy the Poisson equation. 
Show that 

(a) <1> defined by <1> = 4>i — 4> 2 satisfies the Laplace’s equation; 

(b) V • ($V$) = |V$| 2 

22.2. Separate the solution of the heat equation (22.3): T(r, t) = R(r)r(t), 
and show that 

(a) the solution to the time equation is 

r(t) = Ae~ ak2t , 

(b) in which case, the space part must satisfy the following PDE: 

V 2 i? + aR = 0 

22.3. Show that any function of the form /(k • r ± cot) satisfies the wave 
equation (22.4) if to = c|k|. 

22.4. Separate the solution of the wave equation (22.4): 4'(r,t) = R{r)T(t), 
and show that 

(a) the solution to the time equation is 

T(t) = A cos Lot + B sin tot, 

(b) and the space part must satisfy the following PDE: 

V 2 l? + fc 2 l? = 0 


where k = lo/c. 

22.5. Provide the details of the derivation of Equation (22.16). 

22.6. By substituting R = u/r in the radial DE of spherical coordinates, 
show that it reduces to Equation (22.17). 




Chapter 23 

First-Order Differential 
Equations 


The last chapter showed that all PDEs discussed there resulted in ODEs of 
second order, i.e., differential equations involving second derivatives. Thus, 
treating the first-order DEs (FODEs) may seem irrelevant. However, some¬ 
times a second-order DE (SODE) may be expressed in terms of first deriva¬ 
tives. For example, take Newton’s second law of motion along a straight 
line (free fall, say): md 2 x/dt 2 = F. If we write this in terms of velocity, 
we obtain mdv/dt = F, and if F is a function of v alone—as in a fall with 
air resistance—then we have a FODE. FODEs arise in other areas of physics 
beside mechanics. Therefore, it is worthwhile to study them here. 


23.1 Normal Form of a FODE 


The most general FODE is of the form G(x, y, y') = 0, where G is some 
function of three variables. We can find y' (the derivative of y) as a function 
of x and y if the function G(x 1 , 2 : 2 , £ 3 ) is sufficiently well behaved. In that 
case, we have 

y' = ^=F{x,y) (23.1) 


the most general 
FODE in normal 
form 


which is said to be a normal FODE. 


Example 23.1.1. There are three special cases of Equation (23.1) that lead im¬ 
mediately to a solution. 

(a) If F(x,y) is independent of y, then y' = g(x), and the most general solution can 
be written as y = f(x) = C + f* g(t) dt where C = f{a). 

(b) If F(x,y) is independent of x, then dy/dx = h(y), and 


dy 

h(y) 


= dx 


rv 

Jc 


dt 

h(t) 


—x + a = 0 => H(y) — x + a = 0 


= H(y) 
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embodies a solution. That is, H(y) = x — a can be solved for y in terms of x, say 
y — f(x), and this y will be a solution of the DE. Note that y\ x =a = /(a) = C. 

(c) The third special case is really a generalization of the first two. If F(x,y) = 
g(x)h(y), then y' = g{x)h(y) or dy/h(y) = g{x) dx and 


is an implicit solution. 



(23.2) 


The example above contains an information which is important enough to 
be “boxed.” 


Box 23.1.1. A differential equation is considered to be solved if its solu¬ 
tion can be obtained by solving an algebraic equation involving integrals of 
known functions. Whether these integrals can be done in closed form or 
not is irrelevant. 


integral of a 
normal FODE 


an integral of a 
FODE is also 
called a constant 
of motion. 


So, although we may not be able to actually perform the integration of (23.2), 
we consider the DE solved because, in principle, Equation (23.2) gives y as a 
(implicit) function of x. 

As Example 23.1.1 shows, the solutions to a FODE are usually obtained 
in an implicit form, as a function u of two variables such that the solution y 
can be found by solving u(x,y) = 0 for y. Included in u(x,y) is an arbitrary 
constant related to the initial conditions. The equation u(x, y) = 0 defines 
a curve in the a;y-plane, which depends on the (hidden) constant in u(x,y). 
Since different constants give rise to different curves, it is convenient to sep¬ 
arate the constant and write u(x,y) = C. This leads to the concept of an 
integral of a differential equation. 

Definition 23.1.1. An integral of a normal FODE [Equation (23.1)] is 
a function of two variables u(x,y) such that u(x,f(x)) is a constant for all 
possible values of x whenever y = f(x) is a solution of the differential equation. 

The integrals of differential equations are encountered often in physics. If 
x is replaced by t (time), then the differential equation describes the motion 
of a physical system, and a solution, y = f(t), can be written implicitly as 
u(t, y) = C, where u is an integral of the differential equation. The equation 
u(t, y) = C describes a curve in the ty- plane on which the value of the function 
u(t,y ) remains unchanged for all t. Thus, u(t,y), the integral of the FODE, 
is also called a constant of motion. 

Example 23.1.2. Consider a point particle moving under the influence of a force 
depending on position only. Denoting the position 1 by x and the velocity by v, 
we have, by Newton’s second law, mdv/dt = F(x). Using the chain rule, dv/dt = 
(dv / dx){dx / dt ) = vdv/dx, we obtain 

dv 

mv— = Fix) => mvdv = F(x)dx , 
dx 

1 Here we are restricting the motion to one dimension. 


(23.3) 





23.2 Integrating Factors 


553 


which is easily integrated to 

\mv 2 = J F{x) dx + C = -V(x) + C. (23.4) 

The potential energy V(x) = — f F(x)dx has been introduced as an indefinite 
integral. We can write Equation (23.4) as 

\mv 2 + V(x) = C. (23.5) 

Thus, the integral of Equation (23.3) is u(x,v) = \mv 2 + V(x) which is the ex¬ 
pression for the energy of the one-dimensional motion of a particle experiencing the 
potential V(x). If v is a solution of Equation (23.3), then u(x,v) = constant. Since 
a solution of Equation (23.3) describes a possible motion of the particle, Equation 
(23.5) implies that the energy of a particle does not change in the course of its 
motion. This statement is the conservation of (mechanical) energy. _ 


23.2 Integrating Factors 

Let D be a region in the a:y-plane, and let M(x, y) and N(x, y) be continuous 
functions of x and y defined on D. The differential Mdx + Ndy is exact if, 
for arbitrary points Pi and P 2 of D, the line integral 


r^2 


'Pi 


[M(x,y)dx + N(x,y)dy\ 


is independent of the path joining the two points. This condition is equivalent 
to saying that the line integral of the integrand around any closed loop in 
D vanishes. A necessary and sufficient condition for exactness is, therefore, 
that the curl of the vector A = (M, TV, 0) be zero. 2 The vector A is then 
conservative, and we can define a (potential) function v such that A = V« = 
(dv/dx,dv/dy,0 ), or 


dv = — dx + — dy = M dx + N dy. (23.6) 

ox ay 

Thus, M dx + N dy is exact if and only if there exists a function v(x, y) 
satisfying (23.6), in which case, M = dv/dx and N = dv/dy. 

Now consider all y 's that satisfy v(x, y) = C for some constant C. Then 
since dC = 0, we have 

0 = dv = M dx+ N dy. 

It follows that v(x, y) = C is an implicit solution of the differential equation. 
We therefore have 

2 The statement is true only if the region D does not contain any singularities of M or 
N . The region is then called contractable to a point (see Section 14.3). 


potential energy 


exact differential 
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integrating factor 


Theorem 23.2.1. If M(x,y) dx + N(x,y) dy is an exact differential dv in a 
domain D of the xy-plane, then v(x, y) is an integral of the DE 

M(x, y) dx + N(x, y) dy = 0 


whose solutions are of the form v{x, y) = G. 

We saw above that, for an exact differential, M = dv/dx and N = dv/dy. 
A necessary consequence of this result is OKI/dy = ON/dx. Could this relation 
be a sufficient condition as well? Consider the function v{x, y) defined by 


rv 

j(x,y) = / M(t,y)dt+ / N(a,t)dt , 

Ja Jb 


and note that 


dv dv 

dv = — dx + — dy 
dx dy 


d_ 

dx 


M(t, y) dt 


dx - 


OM, , , d 

Ui {t ' v)dt + Sbj 


N(a,t) dt 


dy 


ON/dt 


= M (x, y) dx 


f - 

N(t,y) + N (a, y) 


dy, 


=N(x,y ) 


and v(x, y) indeed satisfies dv = M dx + N dy. It follows that (see Problem 
23.1) 

Theorem 23.2.2. A necessary and sufficient condition for Mdx + Ndy to 
be exact is dM/dy = dN/dx, in which case 

M(t,y) dt + f N(aff)dt 

Jb 

is the function such that dv = M dx + N dy. 

Not very many FODEs are exact. However, there are many that can be 
turned into exact FODEs by multiplication by a suitable function. Such a 
function, if it exists, is called an integrating factor. Thus, if the differential 
M{x , y) dx + N{x, y) dy is not exact, but 



p(x, y)M{x, y) dx + y,(x, y)N(x, y) dy = dv, 


then y{x, y) is an integrating factor for the differential equation 
M{x, y) dx + N(x,y) dy = 0 


whose solution is then v{x, y) = C. Integrating factors are not unique, as the 
following example illustrates. 
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illustration of 
nonuniqueness of 
integrating factor 

= =* X ^+V^ + 2m = 0- (23.7) 

(a) Let us assume that y is a function of x only. Then Equation (23.7) reduces to 
xdy/dx = 2 y or y = C/x 2 where x ^ 0. In this case we get 


Example 23.2.3. The differential xdy — ydx is not exact. Let us see if we can 
find a function y(x,y) such that dv = yxdy — yy dx , for some v(x,y). We assume 
that the domain D of the rry-plane in which v is defined is contractable to a point. 
Then a necessary and sufficient condition for the equation above to hold is 


dv = C 



where x ^ 0. 


Thus, as long as x ^ 0, any function C/x 2 , with arbitrary C, is an integrating factor 
for x dy — y dx = 0. This integrating factor leads to the solution 


Cy 

v = - = constant. 

x 

In order to determine the constant, suppose that y = m when x = 1. Then 
determines the constant in terms of m: 

Cm 

— r 


= constant 


constant = Cm. 


(23.8) 

(23.8) 


So, (23.8) becomes 


Cy 


= Cm 


y = mx. 


(b) Now let us assume that y is a function of y only. This leads to the integrating 
factor y = C/y 2 where y yf 0. In this case v = Cx/y is the integral of the DE, and a 
general solution is of the form Cx/y = constant. If we further impose the condition 
y(l) = m, we get C/m = constant. Equation (23.8) then yields 


Cx C 

y 

as in (a). 

(c) The reader may verify that 

C 

x 2 +y 2 

is also an integrating factor leading to the integral 

v = tan -1 = constant — = tan(constant) = C'. 

Imposing y( 1) = m gives C' = m, so that y = mx as before. ® 

The example above is a special case of the general fact that if a differential 
has one integrating factor, then it has an infinite number of them. Suppose 
that v(x, y) is an integrating factor of M dx + N dy, i.e., vM dx + vN dy is 
an exact differential, say du. Take any differentiable function F(u). Then 
y{x,y) = v{x,y)F'(u) is also an integrating factor. In fact, 

dF 

y(M dx + N dy) = vF'{M dx + N dy) = —— (uM dx + vN dy) = dF. 

du '_ v 


m 


where (x, y) ^ (0, 0) 


proof of 

nonuniqueness of 
integrating factor 


—du 
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23.3 First-Order Linear Differential Equations 

A linear DE is a sum of terms each of which is the product of a derivative 
of the dependent variable (say y) and a function of the independent variable 
order of a linear (say x). The highest order of the derivative is called the order of the linear 

DE DE. The most general first-order linear differential equation (FOLDE) is 

Pi(x)y' +Po(x)y = q{x) & pidy + (p 0 y - q) dx = 0. (23.9) 

If this equation is to have a solution, then by the argument at the end of the 
last subsection, it must have at least one integrating factor. Let p(x, y) be an 
integrating factor. Then there exists v(x,y) such that 

dv = p(p 0 y — q) dx + ppidy = 0 


The necessary and sufficient condition for this to hold is 


|;M!W-«)] = A(»). 

To simplify the problem, let us assume that p is a function of x only (we are 
looking for any integrating factor, not the most general one). Then the above 
condition leads to the differential equation 


d . . dp dpi 

dPo = -Hm) =Pi~r + d~r~ 

dx dx dx 


(23.10) 


or 


dp 

Integrating both sides gives 
" Po 


dpi 

dx 


dp _ Po dx _ dpi 


Pi 


Pi 


In p = / — dx — In pi + In C 
J Pi 


-(f)- 


Po_ 

Pi 


dx 


or 


dP 1 _ J podx/p i 


(J e S Podx/p 1 


M = 


c Pi 

Neglecting the unimportant constant of integration, we have found the in¬ 
tegrating factor p = exp[(J p 0 dx/pi)\/pi. Now multiply both sides of the 
original equation by p to obtain 


ppiy + pp 0 y = pq- 


(23.11) 


With the identity ppry' = (pppy)' — ( ppi)'y and the fact that (ppi)' = ppo 
[the first equality of Equation (23.10)], Equation (23.11) becomes 

d f 

= pq => ppiv= / p(x)q(x)dx + c. 
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Therefore, 


Theorem 23.3.1. Any FOLDE of the form p±(x)y' +po(x)y = q(x), in which 
Po, pi, and q are continuous functions in some interval ( a,b ), has a general 
solution 


V = f(x) 


1 

y{x)pi(x) 


C + 


y{x)q(x) dx 


(23.12) 


where C is an arbitrary constant, and 


p{x) 


Pi(x) 


exp 


Po(x) 

Pl{x) 



(23.13) 


Example 23.3.2. In an electric circuit with a resistance R and a capacitance C, 
Kirchhoff’s law gives rise to the equation RdQ/dt + Q/C = V(t), where V(t) is the 
time-dependent voltage and Q is the (instantaneous) charge on the capacitor. This 
is a simple FOLDE with p\ = R, po = 1/C, and q = V. The integrating factor is 


hit) = — exp 


/ w dt 


f_t/RC 

R 


which yields 


Q(t) = 


l 

Be~ t/RC 


B +^J e t/RC V(t)dt 

— t /RC f 

+ R j e i/RC V(t)dt. 


Recall that an indefinite integral can be written as a definite integral whose upper 
limit is the independent variable—in which case we need to use a different symbol 
for the integration variable. For the arbitrary lower limit, choose zero. We then 
have 

-t/RC ft 

Q(t ) = Be~ t/RC + n / e 3/RC V(s)ds. (23.14) 

R Jo 

Let Q(0) = Qo be the initial charge. Then, substituting t = 0 in (23.14), we get 
Qo = B and the charge at time t. will be given by 


,-t/RC ft 

Q{t) = Qoe~ t/RC + — / e s/RC V(s) ds. 

Jo 


(23.15) 


As a specific example, assume that the voltage is a constant Vo, as in the case 
of a battery. Then the charge on the capacitor as a function of time will be 

Q(t) = Q 0 e~ t/RC + V 0 C( 1 - e~ t/RC ). 


It is interesting to note that the final charge Q(oo) is VoC, independent of the initial 
charge. Intuitively, this is what we expect, of course, as the “capacity” of a capacitor 
to hold electric charge should not depend on its initial charge. _ 


explicit solution of 
a general 
first-order linear 
differential 
equation 


detailed treatment 
of an RC circuit 
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Example 23.3.3. As a concrete illustration of the general formula derived in the 
previous example, we find the charge on a capacitor in an RC circuit when a voltage, 
V(t) = Vo cose ot, is applied to it for a period T and then removed. V(t) can thus 
be written as 


V(t) 


Vo cos u>t if t < T, 
0 if t > T. 


The general solution is given as Equation (23.15). We have to distinguish between 
two regions in time, t < T and t > T. 

(a) For t < T, we have (using a table of integrals) 


- n^-VRC . 


Q(t) = Qoe 


0 -t/RC 


R 


f e s / RC v 0 cos uisds 

Jo 


= Qoe- t/RC + Vo 


1 


R (l/RC) 2 +io 2 


1 -t/RC COS Ut . 

RC 6 + ^ +WSmwt 


If T RC, and we wait long enough , 3 i.e., t RC , then only the oscillatory part 

survives due to the large negative exponents of the exponentials. Thus, 


Q{t) 


Vo 1 f COS U)t 

R {l/RC) 2 +u 2 \ WT 


+ uj sin ut 


The charge Q(t) oscillates with the same frequency as the driving voltage. 

(b) For t > T, the integral goes up to T beyond which V(t) is zero. Hence, we have 


Q{t) = Q 0 e~ t/RC + 
= Q 0 e~ t/RC + 


5 -t/RC 

R 


f 


Vo/R 


e s / RC Vo cos iosds 

t ^ RC t (T-t)/RC ( COSUjT 


(1/RC) 2 + u> 2 [ RC 


+ e. 


RC 


+ u sin ujT 


We note that the oscillation has stopped (sine and cosine terms are merely constants 
now), and for t — T^> RC, the charge on the capacitor becomes negligibly small: If 
there is no applied voltage, the capacitor will discharge. ■ 


Although first-order linear DEs can always be solved -yielding solutions as 
given in Equation (23.12)—no general rule can be applied to solve a general 
FODE. Nevertheless, it can be shown that a solution of such a DE always 
exist, and, under some mild conditions, this solution is unique. Some special 
nonlinear FODEs can be solved using certain techniques some of which are 
described in the following examples as well as the problems at the end of the 
chapter. 

falling object with Example 23.3.4. In Problem 23.11 you are asked to find the velocity of a falling 

air resistance object when the air drag is proportional to velocity. This is a good approximation 

at low velocities for small objects; at higher speeds, and for larger objects, the drag 
force becomes proportional to higher powers of speed. Let us consider the case when 
the drag force is proportional to v 2 . Then the second law of motion becomes 

dv , 2 dv 2 b 

m— = ma — bv => — = a — ■yv , 7 = —. 

dt dt m 

3 Of course, we still assume that t <T. 
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This equation can be written as 
dv 


■ ■yv “ 


= dt 


dv 


A 2 — i 


Now we rewrite 


1 

A 2 — v 2 


1 

2 A 


1 


= 7 dt, 


1 


A 2 = g -. 


(23.16) 


v + A v — A 


multiply both sides of Equation (23.16) by 2A and integrate to obtain 
In |v + A\ — In \v — A\ = 27l7f + In C, 

where we have written the constant of integration as lnC for convenience. This 
equation can be rewritten as 


v + A 


= Ce 


2A-yt 


\v-A I 

Suppose that at t = 0, the velocity of the falling object is Vo, then 

I vo + A I 


vo — A 


= C 


and 


v + A 


vo + A 

v — A 


vo — A 


2 A-~/t 


Now note that A > 0, and v > 0 (if we take “down” to be the positive direction). 
Therefore, the last equation becomes 

V + A _ Vo + A 2 A-yt 

|v - A\ |vo — A\ e 

Suppose that Vo > A; then we can remove the absolute value sign from the RHS, 
and since the two sides must agree at t = 0, we can remove the absolute value sign 
on the LHS as well. Similarly, if vo < A, then v < A as well. It follows that 


v + A _ vo + A 

o 2A'yt 

v — A vo — A 


Solving for v gives 


v = A 
= A 
= A 


(v + A)(v 0 — A) = (v — A)(v 0 + A)e 


(vo + A)e 2Ayt +vo-A 


2 A'yt 


(vo + A)e 2A ~< t — (vo — A) 
vo(e 2Ayt + 1) + A(e 2Ayt — 1) 
v 0 (e 2Ayt - 1) + A(e 2Ayt + 1) 
vo cosh(A7t) + A sinh(A7f) 
vo sinh(T7t) + A cosh(^47t) 


(23.17) 


It follows from Equation (23.17) that at t = 0, the velocity is vo, as we expect. It 
also shows that, when t —> 00 , the velocity approaches A = \/g/"f, the so-called 
terminal velocity. This is the velocity at which the gravitational force and the 
drag force become equal, causing the acceleration of the object to be zero. The 
terminal velocity can thus be obtained directly from the second law without solving 
the differential equation. 


terminal velocity 
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Figure 23.1: The achievement of terminal velocity for a drag force that is proportional 
to the square of speed (the heavy curve) is considerably faster than for a drag force that 
is linear in speed (the light curve) if 7 has the same numerical value for both cases. 


Figure 23.1 shows the plot of speed as a function of time for the two cases of the 
drag force being proportional to v and v 2 with the same proportionality constant. 
Because of the higher power of speed, the terminal velocity is achieved considerably 
more quickly for v 2 force than for v force. Furthermore, as the figure shows clearly, 
the terminal speed itself is much smaller in the former case. Since larger surfaces 
provide a v 2 drag force, parachutes that have very large surface are desirable. ■ 

Example 23.3.5. We consider here some other examples of (nonlinear) FODEs 
whose solutions are available: 


Bernoulli's FODE 


(a) Bernoulli’s FODE: This equation is of the form y' + p(x)y + q(x)y n = 0 where 
n yf 1. This DE can be simplified if we substitute y = u r and choose r appropriately. 
In terms of u, the DE becomes 


u + 


P(x) u t Q(x)^ n 1—r +1 

r r 


= 0 . 


The simplest DE—whose solution could be found by a simple integration—would 
be obtained if the exponent of the last term could be set equal to 1. But this would 
require r to be zero, which is not acceptable. The next simplest DE results if we set 
the exponent equal to zero, i.e., if r = 1/(1 — n). Then the DE becomes 


u + (1 — n)p(x)u + (1 — n)q(x ) = 0 


which is a first-order linear DE whose solution we have already found. 


homogeneous 

FODE 


(b) Homogeneous FODE: This DE is of the form 



To find the solution, make the obvious substitution u = y/x, to obtain y' = u + xv! 


and 


w(u) — u 


du dx 


u T xv! — w(u) => u 


X 


w(u) — u 


X 
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with the solution 

In* = 


i: 


cLt 


w(t) — t 


x = exp 


l 


y/x dt 


w(t) — t 


where c is an arbitrary constant to be determined by the initial conditions. 


23.4 Problems 


23.1. Suppose that region D is contractible to zero. Using the equivalence of 
the vanishing of curl and vanishing of closed line integrals, show that dM/dy = 
dN/dx is both necessary and sufficient condition for M dx + N dy to be exact. 

23.2. Verify that /i = C / ( x 2 + y 2 ) is an integrating factor oixdy — y dx which 
gives rise to 

v = tan -1 (—\= constant => — = C' 

\xJ x 

for a solution of x dy — ydx. 

23.3. Find the general solution of Bernoulli’s FODE 

y' + p(x)y + q{x)y n = 0 where n/1. 


Hint: See Example 23.3.5. 

23.4. Find a solution to the linear fractional DE 
a\x + a 2 y 


dx 


b 1 x + b 2 y 


where a\b 2 ^ a 2 bi. 


Hint: Divide the numerator and denominator by x to obtain a homogeneous 
FODE. 


23.5. Lagrange’s FODE is y — xp(y') — q(y') = 0. Lagrange's FODE 

(a) Let y' = t and consider * as a function of t. Using the chain rule, find 
dx/dt in terms of dy/dt. 

(b) Differentiate Lagrange’s DE with respect to t. Use the resuit of this 
differentiation and that of (a) to arrive at [t — p(t)\x — px = q, where the dot 
indicates differentiation with respect to t. 

(c) Find the (parametric) solution of the DE, considering two separate cases: 
t = p(t) and t ^ p(t). 

23.6. Let u(x, y) = C be a solution of the DE M dx + N dy = 0. Show that: 

(a) (■ du/dx)/M = ( du/dy)/N ; and 

(b) p(x,y) = (du/dx)/M is an integrating factor for the DE. 

23.7. Use direct differentiation to show that the function given in Equation 
(23.12) solves the FOLDE of Equation (23.9). 
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23.8. Analyze the capacitor’s charge in an RC circuit in which a constant 
potential Vo is applied for a time T > 0 and then disconnected. Consider the 
cases where t < T and t > T. 

23.9. Find all functions f(x) whose definite integral from 0 to a: equals the 
square of their reciprocal. 

23.10. (a) Let piu'+pou = 0 be a homogeneous FOLDE in u. Solve it. (Note 
that it can easily be integrated.) 

(b) Consider ppy’ + poy = q- Let y = uv, where u is as in (a), and obtain 
an equation for v. Solve this equation, and obtain a general solution for 
pry' + PoU = q- This is the method of variation of parameters, which can 
also be used for second-order differential equations. 

23.11. A falling body in air has a motion approximately described by the DE 
m dv/dt = mg — bv, where v = dx/dt is the velocity of the body. Find this 
velocity as a function of time assuming that the object starts from rest. 

23.12. Suppose that both the linear (av) and the quadratic (bv 2 ) terms are 
present in the fall of an object with air drag. 

(a) Solve the DE and find the most general solution for the velocity as a 
function of time. Hint: Make the substitution u = v + a/26. 

(b) From this general solution, extract the solutions to the cases where only 
the linear and only the quadratic terms are present by taking the limits 6 —» 0 
and a —» 0. 

23.13. Take the limit of Equation (23.17) as t —> oo and show that it is equal 

to yfgh- 




Chapter 24 

Second-Order Linear 
Differential Equations 


The majority of problems encountered in physics lead to second order linear 
differential equations (SOLDEs) when the so-called nonlinear terms are ap¬ 
proximated out. Thus, a general treatment of the properties and methods 
of obtaining solutions to SOLDEs is essential. In this section, we investigate 
their general properties, and leave methods of obtaining their solutions for 
the next section and later chapters. 

The most general SOLDE is 

V2 ^~dx I +Pl ^'dx +Po ^ y = p3 ^' t 24 - 1 ) 

Dividing by P 2 (x), and writing p for P 1 /P 2 , q for po/p 2 , and r for P 3 /P 2 , 
reduces Equation (24.1) to the normal form, 

^4+P{ x )^+ q (x)y = r { x ). (24.2) 

Equation (24.2) is equivalent to (24.1) if p 2 (x) ^ 0. The points at which p 2 (^) 
vanishes are called the singular points of the DE. 

There is a crucial difference between the singular points of linear DEs and 
those of nonlinear DEs. For a nonlinear DE such as ( x 2 — y)y' = x 2 + y 2 , 
the curve y = x 2 is the collection of singular points. This makes it impossible 
to construct solutions y = /( x) that are defined on an interval I = [a,b] of 
the x-axis because for any a < x < b, there is a y = x 2 for which the DE is 
undefined. On the other hand, linear DEs do not have this problem because 
the coefficients of the derivatives are functions of x only. Therefore, all the 
singular “curves” are vertical, and we can find intervals on the a;-axis in which 
the DE is well behaved. 


normal form of a 
SOLDE 


difference between 
singular points of 
linear and 
nonlinear 
differential 
equations 
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homogeneous 

SOLDE 


superposition 

principle 


24.1 Linearity, Superposition, and Uniqueness 


The FOLDE has only one solution; and we found this solution in closed form 
in Equation (23.12). The SOLDE may have (in fact, it does) more than one 
solution. Therefore, it is important to know how many solutions to expect for 
a SOLDE and what relation (if any) exists between these solutions. 

We write Equation (24.1) as 


L[y] =P3 


where 


d 2 d 

L -^ 2 TT +Pi~T+Po- 
dx z dx 


(24.3) 


It is clear that L is a linear operator 1 by which we mean that for constants 
a and (3, \-[ayi + /3y2] = crL[j/i] + /?L[z/ 2 ]• In particular, if y\ and yi are two 
solutions of Equation (24.3), then 


L[yi - 2/2] = L[yi] - L [y 2 ] = P3 - P 3 = 0 . 

That is, the difference between any two solutions of a SOLDE is a solution 2 
of the homogeneous equation obtained by setting p :i = 0 . An immediate 
consequence of the linearity of L is that any linear combination of solutions 
of the homogeneous SOLDE (HSOLDE) is also a solution. This is called the 

superposition principle. 

We saw in the introduction to Chapter 22 that, based on physical intu¬ 
ition, we expect to be able to predict the behavior of a physical system if we 
know the DE obeyed by that system and equally importantly, the initial data. 
Physical intuition also tells us that if the initial conditions are changed by an 
infinitesimal amount, then the solutions will be changed infinitesimally. Thus, 
the solutions of linear DEs are said to be continuous functions of the initial 
conditions. Nonlinear DEs can have completely different solutions for two 
initial conditions that are infinitesimally close. Since initial conditions cannot 
be specified with mathematical precision in practice, nonlinear DEs lead to 
unpredictable solutions, or chaos. This subject has received much attention 
in recent years, and we shall present a brief discussion of chaos in Chapter 31. 

By its very nature, a prediction is expected to be unique. This expectation 
for linear equations becomes—in the language of mathematics—an existence 
and a uniqueness theorem. First, we need the following 3 

Theorem 24.1.1. The only solution g(x ) of the homogeneous equation y" + 
Py' + qy = o, defined on the interval [a, b], which satisfies g(a) = 0 = g'(a), is 
the trivial solution, g = 0. 

Let fi and ./g be two solutions of (24.2) satisfying the same initial condi¬ 
tions on the interval [a,6]. This means that /i(a) = / 2 (a) = c and f[{a) = 

1 Recall from Chapter 7 that an operator is a correspondence on a vector space that 
takes one vector and gives another. A linear operator is an operator that satisfies Equation 
(7.3). The vector space on which L acts is the vector space of differentiable functions. 

2 This conclusion is not limited to the SOLDE; it holds for all linear DEs. 

3 For a proof, see Hassani, S. Mathematical Physics: A Modern Introduction to Its Foun¬ 
dations, Springer-Verlag, 1999, p. 354. 
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/2(a) = d for some given constants c and c!. Then it is readily seen that their 
difference, g = /j — / 2 , satisfies the homogeneous equation [with r(x) = 0 ]. 
The initial condition that g(x) satisfies is clearly g(a) = 0 = g'{a). By Theo¬ 
rem 24.1.1, g = 0 or fi = f 2 . We have just shown 

Theorem 24.1.2. ( Uniqueness Theorem). If p and q are continuous on 
[a, 6], then at most one solution of Equation (2f.2) can satisfy a given set of 
initial conditions. 

The uniqueness theorem can be applied to any homogeneous SOLDE to 
find the latter’s most general solution. In particular, let fi{x) and / 2 (x) be 
any two solutions of 

y" + p{x)y' + y{x)y = o (24.4) 

defined on the interval [a, 6 ]. Assume that the two vectors vi = (/i(a), /{(a)) 
and v 2 = (/2(a),/2(d)) are linearly independent . 4 Let g(x) be another so¬ 
lution. The vector (g(a), g'{a)) can be written as a linear combination of Vi 
and v 2 , giving the two equations 

g{a) = Ci/i(a) + c 2 / 2 (a), 

9'(a) = ci /((a) + c 2 / 2 (a). 

The function u(x) = g(x) — c±fi(x) — 02/2(2:) satisfies the DE (24.4) and 
the initial conditions u(a) = u'(a) = 0. It follows from Theorem 24.1.1 that 
u{x) = 0 or g{x) = 01/1(2:) + 02/2(2;). We have proved 

Theorem 24.1.3. Let f\ and / 2 he two solutions of the HSOLDE 


y” + py' + qy = 0, 


where p and q are continuous functions defined on the interval [a, h]. If 
(/i(a), f[(a)) and (/ 2 (a), / 2 (a)) are linearly independent vectors, then every 
solution g{x) of this HSOLDE is equal to some linear combination 

g(x) = 01 / 1 ( 2 :) + 02 / 2 ( 2 :), 

with constant coefficients Ci and c 2 . The functions fi and / 2 are called a 

basis of solutions of the HSOLDE. 

The uniqueness theorem states that only one solution can exist for a 
SOLDE which satisfies a given set of initial conditions. Whether such a 
solution does exist is beyond the scope of the theorem. Under some mild 
assumptions, however, it can be shown that a solution does indeed exist. We 
shall not prove this existence theorem for a general SOLDE, but shall examine 
various techniques of obtaining solutions for specific SOLDEs in this and the 
next two chapters. 

4 If they are not, then one must choose a different initial point for the interval. 


uniqueness of 
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24.2 The Wronskian 

To form a basis of solutions, fi and f 2 must be linearly independent. It is 
important to note that the linear dependence or independence of a number of 
functions, defined on the interval [a, 6], is a concept that must hold for all x 
in [a, b\. Thus, if 


cnfi{xo) + a 2 f 2 (x 0 ) H-b a n f n (x 0 ) = 0 

for some xq € [a, 6], it does not mean that the /’s are linearly dependent. 
Linear dependence requires that the equality holds for all x in [o, b}. 

The nature of the linear relation between /i and f 2 can be determined by 
Wronskian defined their Wronskian. 

Definition 24.2.1. The Wronskian of any two differentiable functions fi{x) 
and f 2 (x) is defined to be 


f fi{x) fi(x)\ 

W(f!,f 2 ; x) = h{x)f' 2 (x) - f 2 (x)f[(x) = det 

V/ 2 O) f 2 (x)J 

If we differentiate both sides of the definition of Wronskian and substitute 
from Equation (24.4), we obtain 

/ 2 ; x) = /'/' + hf'f - f' 2 f[ - hf'f 

= fi(-pf 2 - qf 2 ) - f2(-pf[ - qfi) 

= p/ 1/2 -pfif 2 = -p(x)W(f 1 ,f 2 \x). 


We can easily find a solution to this DE: 


dW 

dx 


-pW => 


dW 

~W 


—pdx =f> In IT 


p(t) dt + In C, 


where c is an arbitrary point in the interval [a, b} and C is the constant of 
integration. In fact, it is readily seen that C = IT(c). We therefore have 


W(f 1 ,f 2 -,x)=W(f 1 ,f 2 ;c) e -Jc X PW d f (24.5) 


Note that IT(/i, f 2 \x) = 0 if and only if W(fi,f 2 \ c) = 0, and that [because 
the exponential in (24.5) is positive] W(f±, f 2 \x) and IT(/i, f 2 ',c) have the 
same sign if they are not zero. This observation leads to 


Box 24.2.1. The Wronskian of any two solutions of Equation (2f.f) 
does not change sign in the interval [a, 6]. In particular, if the Wronskian 
vanishes at one point in [a, 6], it vanishes at all points in [a, b ]. 
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Let fi and f 2 be any two differentiable functions that are not necessarily 
solutions of any DE. If f\ and f 2 are linearly dependent, then one is a multiple 
of the other, and the Wronskian is readily seen to vanish. Conversely, assume 
that the Wronskian is zero. Then fi{x)f 2 (x) — f 2 (x)f[(x) = 0. This gives 


differentiability is 
important in the 
statement of Box 
24.2.2. 

val [a, b], are linearly dependent if and only if their Wronskian vanishes. 


fidf 2 = hdfi =► = =*► ln/ 2 = In fi + In C ^ f 2 = Ch 

J 2 /1 

and the two functions are linearly dependent. We have just shown that 


Box 24.2.2. Two differentiable functions, which are nonzero in the inter- 


Example 24.2.1. Let fi(x) = x and f 2 (x) = |*| for — 1 < x < 1. These two 
functions are linearly independent in the given interval, because aix-\-a 2 \x\ = 0 for 
all x if and only if 01 = 02 = 0. The Wronskian, on the other hand, vanishes for 
all — 1 < x < 1: 


W(f u f 2 ;x) 





d 

dx 



if x > 0 
if x < 0 


x if x > 0 

— x if x < 0 


x — x = 0 if*>0, 

—x — (— x) = 0 if x < 0. 


This seems to be in contradiction to Box 24.2.2. It is not! Box 24.2.2 assumes that 
both functions are differentiable in their common interval of definition. However, 
|a:| is not differentiable at x = 0. B 


24.3 A Second Solution to the HSOLDE 


If we know one solution to Equation (24.4), we can use the Wronskian to 
obtain a second linearly independent solution. Let W(x) = W(fi, f 2 ;x) be the 
Wronskian of the two solutions f± and f 2 . Then, by definition and Equation 
(24.5), we have 

fi(x)f 2 (x) - f 2 (x)f[(x) = W(x) = W(c)e-Sc*pW d t, 

where c is an arbitrary point in the interval of interest. Given fi(x), this is a 
FOLDE in f 2 (x), which can be solved by the method of Subsection 23.3. In 
fact, 1 /fi(x) is an integrating factor, and dividing both sides by ff(x) gives 


d_ ' ,/*2 (x) 
dx [fi(x) 


W( x) 

lEx) 
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or 


h(x) 

h(x) 


C 


W(s) 

W) 


ds = C - 


/i 2 (s) 


W (c) exp 



ds, 


where C is an arbitrary constant of integration and a is a convenient point in 
the interval [a, b}. Thus, 


/2O) 


h(x) 


jc + K 



m exp 




(24.6) 


where we substituted K for W(c). We do not have to know W{x) (this 
would require knowledge of / 2 , which we are trying to calculate!) to obtain 
K = W (c). In fact, it is a good exercise for the reader to show that / 2 , as given 
by (24.6), indeed satisfies Equation (24.4) no matter what K is. Note also 
that / 2 (a) = Cfi(a). Whenever possible—and convenient—it is customary 
to set C = 0 because its presence simply gives a term that is proportional to 
the known solution fi(x). 


Example 24.3.1. (a) A solution to the SOLDE y" — k 2 y = 0 is e kx . To find a 
second solution, we let C = 0 and K = 1 in Equation (24.6). Since p(x) = 0, we 
have 


/ 2 (*) = e te 




e~ 2ka 

2 k 


e 


kx 


which, ignoring the second term which is proportional to the first solution, leads 
directly to the choice of e~ kx as a second solution. 

(b) The differential equation y" + k 2 y = 0, which arises in mechanics in the study 
of the motion of a mass attached to the end of a spring, has sinfca: as a solution. 
With C = 0, c = a = iv/2k, and K = 1, we get 


/2 (*) = sin kx 



= — sin kx cot ks\Z/ 2 k = — cos kx. 


Thus, sinfca: and cos kx form a basis of solution, and a general solution is of the 
form 

y(x) = A cos kx + B sin kx, 

a result that should be familiar to the reader from introductory physics. 

(c) For the solutions in part (a), 

( ^kx \ 

e -k* _ ke -k*) = -2 fc 


and for those in part (b), 


W (x) = det 


/ sin kx 
i^cos kx 


k cos kx 
— k sinfca: 


= -k. 


Both Wronskians are constant. This is a special case of a result that holds for all 
DEs of the form y" + q(x)y = 0. g 
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Most special functions used in mathematical physics are solutions of SOL- 
DEs. The behavior of these functions at certain special points is determined by 
the physics of the particular problem. In most situations physical expectation 
leads to a preference for one particular solution over the other. For example, 
although there are two linearly independent solutions to the Legendre DE, 


d_ 

dx 


(l-* 2 ) 


dy_ 

dx 


+ n(n + 1 )y = 0, 


the solution that is most frequently encountered is a Legendre polynomial 
P„( x) discussed in Chapter 26. The other solution can be obtained by using 
Equation (24.6). 


24.4 The General Solution to an ISOLDE 

We now determine the most general solution of an inhomogeneous SOLDE 
(ISOLDE). Let g(x) be a particular solution of 

L[y] = y" + py' + qy = r(x) (24.7) 

and let h(x ) be any other solution of this equation. Then h(x ) — g(x) satisfies 
Equation (24.4) and, by Theorem 24.1.3, can be written as a linear combina¬ 
tion of a basis of solutions fi(x) and f 2 {x). It follows that 

h(x) = c 1 f 1 (x) + c 2 f 2 (x ) +g(x ). (24.8) 


Box 24.4.1. If we have a particular solution of the ISOLDE of Equation 
(24-7) and two basis solutions of the HSOLDE, then the most general 
solution of ( 24 .7) can be expressed as the sum of a linear combination of 
the two basis solutions and the particular solution. 


We know how to find a second solution to the HSOLDE once we know one 
solution. We now show that knowing one such solution will also allow us to 
find a particular solution to the ISOLDE. The method we use is called the 
method of variation of constants. 

Let fi and f 2 be the two (known) solutions of the HSOLDE and g(x) the 
sought-after solution to Equation (24.7). Write g as g(x) = f±(x)v(x) with 
v a function to be determined. Substitute this in (24.7) to get a SOLDE for 
v{x ): 


v 


n 


2 fi 
h 


h' 


This is a first -order linear DE in v 1 which has a solution of the form (see 
Problem 24.6) 


W{x) 

1W) 


C 


h{t)r{t) 
W(t ) 


dt 


Legendre 

differential 

equation 


method of 
variation of 
constants 

with a solution of 
HSOLDE at our 
disposal, we can 
find a particular 
solution of an 
ISOLDE. 


V 
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where W{x) is the (known) Wronskian of Equation (24.7). Substituting 
W{x) = fi{x)f^{x) - / 2 (s)/{( x) = d_(h\ 

fi( x ) f ?( x ) dx V/i/ 

in the above expression for v' and setting (7=0 (we are interested in a 
particular solution), we get 

— = — f f X rlf 

dx dx \fij J a W ( t ) 

= d_ 1 72(3;) f x fi(t)r(t) I _ f 2 {x) d_ f x fi{t)r{t) 

dx[fi(x)J a W(t) \ fi(x) dx J a W(t) 


and, by integration, 


v(x) = im r dt 

[) h{x)J a W (t) 


=fl{x)r(x)/W(x) 


f 2 (t)r(t) 


where in the last integral, we used t as the variable of integration. This leads 
to the particular solution 

g{x) = h{x)v{x) = f 2 (x) £ dt - fi(x) £ *£££- dt. (24.9) 

Note how symmetric f± and f 2 appear in the final result. 

It thus follows that 



Box 24.4.2. Given a single solution fi(x) of the homogeneous equation 
corresponding to an ISOLDE, one can use Equation (2^.6) to find a second 
solution f 2 (x) of the homogeneous equation and Equation (2f.9) to find a 
particular solution g(x). The most general solution h, will then be 

h{x) = ci/i(a;) + c 2 f 2 (x ) + ^(a;). 


24.5 Sturm—Liouville Theory 

We saw in Chapter 22 that the separation of PDEs normally results in ex¬ 
pressions of the form 

(j‘2 i j du 

L[u] + Au = 0, or p 2 (x)—r+pi(x)- —b Po(x)u + Xu = 0, (24.10) 

dx z dx 

where a is a function of a single variable and A is, a priori, an arbitrary 
constant. This is an eigenvalue equation for the operator L just as Equation 
(7.17) was an eigenvalue equation for the matrix T. In this section, we try 
to learn some properties of this eigenvalue problem, but first we need to 
understand the concept of the adjoint of a differential operator. 
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24.5.1 Adjoint Differential Operators 

In our discussion of the eigenvalues and eigenvectors of matrices in Section 7.4, 
symmetric matrices seemed to be special (see Theorem 7.4.1). The analog of a 
symmetric matrix in the case of differential operators (DO) is a self-adjoint 
differential operator. 

The HSOLDE 


L[y] = Pi(x)y" + pi(x)y' + p 0 {x)y = 0 (24.11) 

is said to be exact if it can be written as 

L [y] = ^l A ( x )y' + B ( x )y]- (24.12) 

An integrating factor for L[p] is a function p{x) such that y{x)L[y] is exact. 
If an integrating factor exists, then Equation (24.11) reduces to 

^r[A{x)y'+ B(x)y\ = 0 => A{x)y' + B{x)y = C, 
ax 

a FOLDE with a constant inhomogeneous term whose solution is given in 
Theorem 23.3.1. Even the ISOLDE corresponding to Equation (24.11) can be 
solved, because 

n(x)L[y\ = p{x)r{x) => ^~[A(x)y' + B(x)y\ = p{x)r{x) 

ax 

=> A{x)y' + B(x)y = f p{t)r{t) dt, 

J a 

which is a general FOLDE. Thus, the existence of an integrating factor com¬ 
pletely solves a SOLDE. It is therefore important to know whether or not a 
SOLDE admits an integrating factor. 

If the SOLDE is exact, then (24.12) must equal (24.11), implying that 
P 2 = A, pi = A' + B, and po = B'. It follows that p^ = A", p[ = A" + B' , 
and po = B' , which in turn give pj ~P\ +Po = 0. Conversely if p% —p\ +po = 0, 
then, substituting p 0 = —p” 2 + p\ in the LHS of Equation (24.11), we obtain 

Piy" + pry 1 + poy = Piy" + pry' + {-p'i + p[)y 

= P2y" - p'iy + (piy)' = {pry' - pry)' + {pry)' 

= ^-{P2y' -p' 2 y +pw), 

ax 

and the DE is exact. Therefore, 


Box 24.5.1. The SOLDE of Equation (24-11) is exact if and only if 
P2 - Pi + Po = 0. 


exact SOLDE 


integrating factor 
for SOLDE 
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A general SOLDE is clearly not exact. Can we make it exact by multi¬ 
plying it by an integrating factor as we did with a FOLDE? An immediate 
consequence of Box 24.5.1 is 


Box 24.5.2. A function /i is an integrating factor of the SOLDE of Equa¬ 
tion (24-11) if and only if it is a solution of the HSOLDE 

M[/i] = (P2M)" - (pm) 1 + poll = 0. (24.13) 


We can expand Equation (24.13) to obtain the equivalent equation 

p 2 p” + (2 p' 2 ~ Pi)// + (p 2 ~ Pi + Po)p = 0. (24.14) 

The operator M given by 

linear 

M = P2 ^2 + (V 2 ^ Pl ^- + (P2~Pi +Po) (24.15) 

is called the adjoint of the operator L and denoted by M = lA This is the 
equivalent of the transpose of a matrix T 4 . 

Box 24.5.2 confirms the existence of an integrating factor. However, the 
latter can be obtained only by solving Equation (24.14), which is at least as 
difficult as solving the original differential equation! In contrast, the integrat¬ 
ing factor for a FOLDE can be obtained by a mere integration [see Equation 

(23.13) ]. 

Although integrating factors for SOLDEs are not as useful as their coun¬ 
terparts for FOLDEs, they can facilitate the study of SOLDEs. Let us 
first note that the adjoint of the adjoint of a differential operator is the 
original operator: (L')'i' = L (see Problem 24.10). This suggests that if 
v is an integrating factor of L[u], then u will be an integrating factor of 
M [z;] = L'[u]. In particular, multiplying the first one by v and the second 
one by u and subtracting the results, we obtain [see Equations (24.11) and 

(24.13) ] uL[w] — «M[v] = ( vp 2 )u" — u(p 2 v)" + (vpi)u' + u(piv)', which can be 
simplified to 


vL[u] — mM[b] = — [p 2 vu' — (p 2 v)'u + p\uv). (24.16) 

Integrating this from a to b yields 

(rLfu] — «M[«]) dx = [p 2 vu' — (jp 2 v)'u + piuv] \\ (24.17) 

Equations (24.16) and (24.17) are called the Lagrange identities. 

As in the case of matrices, a self-adjoint differential operator (correspond¬ 
ing to a symmetric matrix for which T = T 4 ) merits special consideration. 
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For M[v] = L [w] to be equal to L[u], we must have [see Equations (24.11) and 
(24.14)] 2p' 2 — pi = pi and p 2 — p[ +po = Po- The first equation gives p 2 = pi, 
which also solves the second equation. If this condition holds, then we can 
write Equation (24.11) as L[y] = p 2 y" + p 2 y' + Poy, or 


L[y] 


d_ 

dx 


( ,dy 

n{x) ih. 


+ p 0 {x)y = 0. 


Can we make all SOLDEs self-adjoint? Let us multiply both sides of 
Equation (24.11) by a function w(x), to be determined later. We get the new 
DE 

w(x)p 2 (x)y” + w(x)pi{x)y' + w(x)p 0 (x)y = 0, 

which we desire to be self-adjoint. This will be accomplished if we choose 
w(x ) such that wpi = (wp 2 )', or p 2 w' + w(p 2 — pi) = 0, which can be readily 
integrated to give 

w(x) = — exp 
P2 

We have just proved the following: 

Theorem 24.5.1. The SOLDE of Equation (24-11) is self-adjoint if and only 
if p 2 = pi, in which case the DE has the form 



d_ 

dx 


P2{x) 


dy_ 

dx 


+ p 0 (x)y = 0. 


If it is not self-adjoint, it can be made so by multiplying it through by 


w(x) = — exp 
P2 


Pi(t) 

P2(t) 


dt 


Example 24.5.2. (a) The Legendre equation in normal form, 


y - 


2x 


1 — x‘ 


: y + 


A 


1 — x’ 


:V = 0 , 


is not self-adjoint. However, we get a self-adjoint version if we multiply through by 
w(x) = 1 — x 2 : 

(1 - x 2 )y" - 2 xy' + Xy = 0, or [(1 - x 2 )y'\ + Xy = 0 
(b) Similarly, the normal form of the Bessel equation 


// i 1 / i i i 

y + -y + i - 


y = o 


is not self-adjoint, but multiplying through by h(x) = x yields 


d ( dy 


dx \ dx 

which is clearly self-adjoint. 


— '-r ' — •'/ 11 


all SOLDEs can 
be made 
self-adjoint 
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24.5.2 Sturm-Liouville System 

Now that we know that every SOLDE can be made self-adjoint, let’s apply 
the procedure to our starting DE (24.10). If we multiply that equation by the 
w(x) of Theorem 24.5.1 it becomes self-adjoint, and can be written as 


d_ 

dx 



+ [Atu(a;) — q(x)\u = 0 


or 


LM 


d_ 

dx 



q(x)u = —\w(x)u 


(24.18) 


with p(x) = w{x)p 2 {x) and q(x) = —po(x)w(x). Equation (24.18) is the 
standard form of the Sturm-Liouville (S-L) equation. 

The appearance of w is the result of our desire to render the differential 
operator self-adjoint. It also appears in another context. Write the Lagrange 
identity (24.16) for a self-adjoint differential operator L: 

rtL[u] — uL[d = -^—{p(x)[u{ x)v'(x) — v{x)u'(x)]\. (24.19) 

dx 

If we specialize this identity to the S-L equation of (24.18) with u = u\ 
corresponding to the eigenvalue Ai and i> = w 2 corresponding to the eigenvalue 
A 2 , we obtain for the LHS 


uiL[it 2 ] — m 2 L[ui] = tti(— X2WU2) + w 2 (Aiumi) = (Ai - A 2 )wwiu 2 . 
Integrating both sides of (24.19) then yields 

f b 

(Ai - A 2 ) / wu\U 2 dx = {p(x)[ui(x)u' 2 (x) - U 2 (x)u[(x)]} b a . (24.20) 

J a 

A desired property of the solutions of a self-adjoint DE is their orthogonality 
when they belong to different eigenvalues. This property will be satisfied if 
we assume an inner product integral with weight function w(x), and if the 
RHS of Equation (24.20) vanishes. There are various boundary conditions 
(BC) that fulfill the latter requirement. One such boundary conditions are 
separated boundary conditions: 

ol\ u(a) + Piu'(a) = 0 , 

a 2 u(b) + /? 2 tt'(&) = 0, (24.21) 

where an, a 2 , Pi, and /3 2 are real constants. Another set of appropriate bound¬ 
ary conditions is the periodic BC given by 

u(a) = u(b) and u'(a ) = u'(b). (24.22) 


The collection of the DO and the boundary conditions is called a Sturm- 
Liouville (S-L) system. 
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Example 24.5.3. For fixed v the DE 


d 2 u 1 du ( 2 
dr 2 + r dr + V 



0 < r < b 


(24.23) 


transforms into the Bessel equation u" + u'/x + (1 — u 2 /x 2 )u = 0 if we make 
the substitution kr = x. Thus, the solution of the S-L equation (24.23) that is 
analytic at r = 0 and corresponds to the eigenvalue k 2 is Mfc(r) = J„(fcr)—because 
Bessel functions J v (x) are entire functions. For two different eigenvalues, k 2 and k 2 , 
the eigenfunctions are orthogonal if the boundary term of (24.20) corresponding to 
Equation (24.23) vanishes, that is, if 

{r[J„(fcir)J' (k 2 r) - J v {k 2 r)Jl(kir)]} b 0 


vanishes, which will occur if and only if J v (k\b) Jl{k 2 b) — Jv(k 2 b)J' v (k\b) = 0. A com¬ 
mon choice is to take J v (kib) = 0 = J^(k 2 b), that is, to take both kib and k 2 b as (dif¬ 
ferent) roots of the Bessel function of order v. We thus have f Q rJ l/ (kir)J l ,(kjr) dr = 
0 if ki and kj are different roots of J v (kb) = 0. 

The Legendre equation 


d 

dx 


^~ x) Tx 


“h Xu — 0, 


where — 1 < x < 1, 


is already self-adjoint. Thus, w(x) = 1, and p(x ) = 1 — x 2 . Solutions of this DE 
corresponding to A = n(n + 1) are the Legendre polynomials P n (x). The bound¬ 
ary term of (24.20) clearly vanishes at a = —1 and b = +1, and we obtain the 
orthogonality relation: P n (x)P m (x) dx = 0 if m ^ n. 

The Hermite equation is 


u" - 2 xu + Au = 0. (24.24) 

It is transformed into an S-L system if we multiply it by w(x) = e~ x . The resulting 

du] 


S-L equation is 


d 

dx 


6 dx 


+ Ae x u = 0. 


(24.25) 


The function u is an eigenfunction of (24.25) corresponding to the eigenvalue A if 
and only if it is a solution of (24.24). Solutions of this DE corresponding to A = 2n 
are the Hermite polynomials H n (x). The boundary term corresponding to the two 
eigenfunctions ui(x) and u 2 (x) having the respective eigenvalues Ai and X 2 ^ Ai is 

{e~ x [ui{x)u' 2 {x) - u 2 (x)ui(*)]}„. 

This vanishes for arbitrary u\ and u 2 if a = —oo and b = +oo. We can therefore 
write e~ x H n [x)Hm{x) dx = 0 if m n. ® 


24.6 SOLDEs with Constant Coefficients 

The SOLDEs with constant coefficients occur frequently and their solutions 
are easily accessible. In fact, we need not confine ourselves to the second order 
equations. The most general nth-order linear differential equation (NOLDE) 
with constant coefficients can be written as 

L[y] = 2 / (n) + H-b a^y' + a 0 y = r(x). (24.26) 

The corresponding homogeneous NOLDE (HNOLDE) is obtained by setting 
r(x) = 0. 
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24.6.1 The Homogeneous Case 

The solution to the HNOLDE 

L [y] = 2/ (n) + a n -iy (n ~ 1} H-h ary 1 + a 0 y = 0 (24.27) 

can be found by making the exponential substitution y = e Xx , which results in 
the equation \.[e Xx ] = (A" + a n _iA ra_1 + • • • + aiA + ao)e Xx = O.This equation 
will hold only if A is a root of the characteristic polynomial 

p( A) = A n + a„_iA” _1 H-1- aiA + a 0 

which, by the fundamental theorem of algebra, can be written as 

p( A) = (A - Ai) fcl (A - X 2 ) k2 ... (A - A m ) fc ”\ (24.28) 

The A i are the distinct roots of p(A) with A j having multiplicity kj. 

It is convenient to introduce D = d/dx and define the differential 
operator 

L = p(D) = D" + a n _iD™ _1 + • • • + aiD + a 0 . 

Since D — y and D A commute for arbitrary constants y and A, we can 
unambiguously factor out the above and obtain 

L = p(D) = (D - Ar) fcl (D - A 2 ) fca ... (D A m ) fc ”\ (24.29) 

In preparation for finding the most general solution for Equation (24.27), 
we first note that 

(D - X)e Xx = 4-e Xx ~ Xe Xx = 0 (24.30) 

ax 

and 

(D - X)(x r e Xx ) = 4-(x r e Xx ) - Xx r e Xx = rx r ~ x e Xx . 
ax 

If we apply D — A twice, we get 

(D - X) 2 (x r e Xx ) = (D - X)(rx r ~ 1 e Xx ) = r(r - l)x r ~ 2 e Xx 
and in general, 

(D - X) k (x r e Xx ) = r(r-l)...(r~k+ l)x r ~ k e Xx 
which, for k = r, gives 

(D- X) r {x r e Xx ) =r\e Xx . 

If we apply D — A one more time, we get zero by (24.30). Therefore, 

(D - X) k (x r e Xx ) = 0 if k > r. 

The set of functions 


AiaJi/ci-i r„r 2 „A2X-ifc2-l 
l x e Jri=0> l x e Jr 2 =0> 


.., {x r 


xX^km — 1 
J Vm— 0 ’ 


(24.31) 
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are all solutions of Equation (24.27). For example, an element of the first set 
yields 


L[x ri e XlX ] = (D - Ai) fcl (D - A 2 ) fe2 ... (D - A m ) fe "*(a; ri e Aia: ) 

= (D A 2 ) fc2 ... (D \ m ) km (D - Ai) fcl 0r ri e Aia: ) = 0. 

S -v-' 

=0 because k\ > r± 

If the root A is complex and the coefficients of the DE are real, then 
the complex conjugate A* is also a root (see Problem 24.14). It follows that 
whenever x rj e XjX is a solution of the DE for complex A j, so is x Ti e x i x . Thus, 
writing A j = aj + i/3j and using the linearity of L, we conclude that 

x rj e aiX cos PjX and x rj e ajX sin/3jX, where rj = 0,1,..., kj — 1, 

are all solutions of (24.27). 

It is easily proved that the functions x rj e XjX are linearly independent (see 
Problem 24.13). Furthermore, ~ n by Equation (24.28). Therefore, 

the set 

{x ri e XiX }, where r 7 =0,1..., kj — 1 and j = 1,2,..., m, 

contains exactly n elements. We have thus shown that there are at least n 
linearly independent solutions for the HNOLDE of Equation (24.27). In fact, 
it can be shown that there are exactly n linearly independent solutions. 


Box 24.6.1. Let Ai,A 2 ,...,A m be the roots of the characteristic poly¬ 
nomial of the real HNOLDE of Equation (24-27), and let the respective 
roots have multiplicities k\, fc 2 ,..., k m . Then the functions x rj e XjX , where 
rj = 0,1 ..., kj — 1, are a basis of solutions of Equation (24-27). 


Example 24.6.1. An equation that is used in both mechanics and circuit theory is 

^+a^- + by = 0 with a,b> 0. (24.32) 

dt at 

Its characteristic polynomial is p(A) = A 2 + aA + b which has the roots 
Ai = \ (—a + \J a 2 — 46) and A 2 = \ (— a — \j a 2 — 4b). 

We can distinguish three different possible motions depending on the relative sizes 
of a and b. 

(a) a 2 > 4 b (overdamped): Here we have two distinct simple roots. The multi¬ 
plicities are both one: fci = fc 2 = 1 (see Box 24.6.1). Therefore, the power of t for 
both solutions is zero (ri = r 2 = 0). Let 7 = 1\/ a 2 — 4b. Then the most general 
solution is 

y(t) = e~ at/2 (cie lt + c 2 e -7t ). 


overdamped 
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Since a > 2y, this solution starts at y = ci + C 2 at t = 0 and continuously 
decreases; so, as t —* oo, y(t) —* 0. 

(b) a 2 = 4 b (critically damped): In this case we have one multiple root of order 
2 (fci = 2); therefore, the power of x can be zero or 1 (ri = 0,1). Thus, the general 
solution is 

v(t) = ci te~ at/2 + cne"“ t/2 

This solution starts at y( 0) = Co at t = 0, reaches a maximum (or minimum) at 
t = 2/a — co/ci, and subsequently approaches zero asymptotically (see Problem 
24.23). 

(c) a 2 < 4 b (underdamped): Once more, we have two distinct simple roots. The 
multiplicities are both one (fci = fc - 2 = 1); therefore, the power of x for both solutions 
is zero (ri = r 2 = 0). Let oj = \\/ 4 b — a 2 . Then Ai = —a/2 + iu> and A 2 = AJ. The 
roots are complex, and the most general solution is thus of the form 

y(t) = e _at/,2 (ci cos wt + C 2 sin cat) = Ae~ at ^ 2 cos(c vt + a). 

The solution is a harmonic variation with a decaying amplitude Aexp(— at/2). Note 
that if a = 0, the amplitude does not decay. That is why a is called the damping 
factor (or the damping constant). All three cases are shown in Figure 24.1. 

These equations describe either a mechanical system oscillating (with no external 
driving force) in a viscous (dissipative) fluid, or an electrical circuit consisting of a 
resistance R, an inductance L, and a capacitance C. For mechanical oscillators, 
a = p/m and b = k/m, where /3 is the dissipative constant related to the drag force 
/drag and the velocity v by /drag = I3v, and k is the spring constant (a measure of 
the stiffness of the spring). 

For RLC circuits, a = R/L and b = 1/LC. Thus, the damping factor depends 
on the relative magnitudes of R and L. On the other hand, the frequency 



depends on all three elements. In particular, for R > 2 yjL/C, the circuit does not 
oscillate. B 



Figure 24.1: The solid thin curve shows the behavior of an overdamped oscillator. The 
critically damped case is the dashed curve, and the underdamped oscillator is the thick 


curve. 





24.6 SOLDEs with Constant Coefficients 


579 


24.6.2 Central Force Problem 

One of the nicest applications of the theory of DEs, and the one that initiated 
the modern mathematical analysis, is the study of motion of a particle under 
the influence of a central gravitational force. Surprisingly, such a motion can 
be reduced to a one-dimensional problem, and eventually to a SOLDE with 
constant coefficients as follows. 

Subsection 12.2.1 treated the equations of motion of a particle under the 
influence of a central force. Conservation of angular momentum and the right 
choice of the initial position and velocity (what amounted to setting L = 
L z e z = Le z ) eliminated the polar angle 6 by assigning it the value ir/2. Thus 
the particle is confined to the plane perpendicular to the angular momentum 
vector, i.e., essentially the vector r x v. The set of three complicated DEs 
(12.20) reduces to a much simpler set consisting of (12.22) and (12.24) which 
we rewrite here as 


inf — 


L 2 

mr 3 


F(r), 


L 


<P = 


mr 


2 ' 


(24.33) 


In principle, we can solve the first equation and find r as a function of t, 
then substitute it in the second equation and integrate the result to find p as 
a function of time. However, it is more desirable to find r as a function of p, 
i.e., find the shape of the orbit of the moving particle. 

In that spirit, we define a new dependent variable u = 1/r, and making 
multiple use of the chain rule, we write the DEs with u as the dependent 
variable and p as the independent variable. We thus have 


r = 


1 

u 


r = — 


u 

4/2 


L d / du 
m dt \ dp 


1 . du 2 . du L du 

— 2^~r = ~ r =-r> 

tr dp dp m dp 

L d 2 u . L d 2 u L L 2 2 d 2 u 

i 2 ^ ~7 2 2 2 ^ ~l 2 ’ 

m dp m dp rnr z m. dp 


Substituting for f and r in terms of u and its derivative, Equation (24.33) 
yields 


d 2 u m /IN 

dp 2 +U = ~LW F \u) ' 


(24.34) 


Johannes Kepler (1571 1630) was a premature baby and a very delicate child 
who was brought up by his grandparents. After elementary and secondary schooling, 
Kepler entered Tubingen University to become a Protestant minister. At Tiibingen 
Kepler was taught astronomy by one of the leading astronomers of the day, Michael 
Maestlin (1550-1631). The astronomy of the curriculum was, of course, geocen¬ 
tric astronomy. At the end of his first year Kepler got ’A’s for everything except 
mathematics. Probably Maestlin was trying to tell him he could do better, because 
Kepler was in fact one of the select pupils to whom he chose to teach more ad¬ 
vanced astronomy by introducing them to the new, heliocentric cosmological system 


central force 
problem 



Johannes Kepler 
1571-1629 
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of Copernicus. It was from Maestlin that Kepler learned that the preface to Coper¬ 
nicus’s book, explaining that this was ’only mathematics’, was not by Copernicus. 
Kepler seems to have accepted almost instantly that the Copernican system was 
physically true, and from then on, astronomy and mathematics became his passion. 
Kepler also worked and wrote a book in optics, in which he used the idea of a ‘ray 
of light’ for the first time. 


For the Kepler problem this equation is easy to solve because 5 


F(r) = - 


and we have 


K 

l\ 

'r 2 ^ F 



\uj 

d 2 u 

Km 

lp +u = 

L 2 


= -Ku 2 


Let v — u — Km/L 2 . Then Equation (24.35) becomes 


(24.35) 


d 2 v 

dip 2 


+ v = 0. 


The characteristic polynomial of this equation is A 2 + 1, whose roots are 
A = ±i. These simple roots give rise to the linearly independent solutions 
v = sin ip and v = cos ip. The general solution can therefore be expressed 
as v = C\ cos p + C ‘2 sin p which, using Problem 24.22, can be rewritten as 
v = Acos(</? — ipo). Therefore, 


v = u — Km/L 2 = Acos(ip — ipo) => u = Km/L 2 + Acos(p — ipo) 


or 


(Km/L 2 ) + Acos(<yj — ipo) 


(24.36) 


This is the equation of a conic section in plane polar coordinates (see Problem 
24.15). 

We now investigate the details of Equation (24.36). First we note that 
when ip = ipo, r is either a maximum or a minimum depending on the sign 
of A. With an ellipse in mind, this corresponds to the (major) axis of the 
ellipse making an angle ipo with the a;-axis. Thus setting ipo = 0 corresponds 
to choosing the axis of the conic section to be our £-axis. We adhere to this 
choice and write 

(Km/L 2 ) + A cos <p ^ ^ 

Next we want to determine the constant A in terms of the energy of the 
particle. The potential energy (PE) is clearly —K/r. So, let us concentrate 


5 Although the Kepler problem usually refers to the gravitational central force, we want 
to keep the discussion general enough so that electrostatic force is also included. Thus, K 
introduced below can be either GMm or — k e q\q 2 . 
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on the kinetic energy (KE). The velocity of the particle is given in Equation 
(12.16) with 9 = 7t/ 2 and 9 = 0. Thus 


KE = \mv 2 


1 / ’2 , 2 • 2 \ l „-2 , 

2 m(r + r p ) = ^mr + 


L 2 

2 mr 2 ’ 


(24.38) 


where we used the second equation in (24.33). The second term in (24.38) is 
sometimes called the centrifugal potential because (like a potential energy) 
it is a position-dependent energy that (like a centrifugal force) has resulted 
from a velocity-dependent term. Differentiating Equation (24.37) with respect 
to time gives 


Ap sin p 

[( Kvn/L 2 ) + Acosp] 2 


Ar 2 tp setup. 


Squaring and using the second equation in (24.33), we obtain 


centrifugal 

potential 


• Z 4 Z 4 • Z • Z 4 Z • Z 

r = A r ip sin (p = —-A sm <p. 

m z 


We can eliminate the sine term in favor of terms involving r by solving for 
Acosp in (24.37): 


. 1 Km 

Acosp>= - — 

r L z 


A 2 sin 2 p = A 2 



It follows that 


KE = \m 


' l 2 a 2 

L 2 

(\ Km\ 2 

L 2 

m 2 

m 2 ^ 

iyr L 2 ) 

| 

2mr 2 


L 2 A 2 K 2 m K 

~2m 2 IX + V 


and 


E = KE+ PE = 


L 2 A 2 

2m 


K 2 m 
2 L 2 


K 

r 


K 

r 


L 2 A 2 K 2 m 
~ 2 to 2lX' 


so that 


A = ± 


2m E 
~K r 


K 2 m 2 

L 4 


To avoid negative signs at later stages, we choose the negative sign now and 
finally write 


L 2 /(Km) L 2 /{Km) 

1 — \J2EL 2 / (K 2 m) + 1 cos ip 1 — e cos p ’ 


(24.39) 


where 


2 EL 2 
K 2 m. 


+ 1 


is called the eccentricity of the conic section. 


(24.40) 


eccentricity of 
orbits 
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The eccentricity, which by its very definition is always positive, determines 
the shape of the orbit. Let us concentrate on the interesting case of elliptic 
orbits corresponding to 0 < e < 1 indicating that the total energy of the 
particle is negative. Inspection of Problem 24.15 reveals that the semi-major 
and semi-minor axes of the ellipse are, respectively, 

2 L 4 2 L 4 

a = (1 - e 2 ) 2 K 2 m 2 and 6 = (1 - e 2 )K 2 m 2 ' 

Substituting for e from Equation (24.40) and noting that E < 0, we obtain 


a = — 


I< 
2 E 



and 


b = 


L 

\/—2mE 



(24.41) 


The negativity of energy in an elliptic orbit is an indication of the stability 
of the orbit. The potential energy is negative and larger in absolute value than 
the kinetic energy. If the total energy is negative (and, of course, constant), 
the particle cannot move too far away from the center of attraction, because 
the magnitude of the PE may become too small to offset the positive KE. The 
absolute value of this total negative energy is called the binding energy. For 
an ellipse this binding energy is K/2a. 


Kepler’s Laws 

In 1609 Johannes Kepler, the German astronomer, after painstakingly an¬ 
alyzing the motion of Mars for many years announced what is now called 
Kepler’s first law of planetary motion: The orbit of Mars is not a circle 
but an ellipse. In the context of a very resilient tradition—dating back to 
Pythagoras himself—in which circular orbits were given almost a divine sta¬ 
tus, this announcement was truly monumental. Kepler had a hunch that all 
planets obey this same law, but could not prove it. Equation (24.37) is the 
mathematical statement of Kepler’s first law. 

Kepler’s second law of planetary motion states that equal areas are 
swept out in equal times by the line joining the planet to the center of attraction 
(the Sun). In other words the rate of change of the area is a constant. This 
can be seen by referring to Figure 24.2 and noting that 

. . i -p— i , . . A A , 2 Aip dA \ i . 

AA « - 2 rAB « 2 r(rAip) ^ *- r —= 2 r <p. 

So, by the second equation in (24.33), dA/dt = L/2m which is a constant. 

After the first two laws, Kepler spent another 12 years searching for a 
“harmony” in the motion of planets. The imperfection he injected in the 
planetary motions by the assumption of elliptical orbits prompted him to 
seek for some sort of compensation. His third law was precisely that. He felt 
that this law, with its precise mathematical structure, gave sufficient harmony 
to the waltz of planets around the Sun to offset the imperfection of elliptical 
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Figure 24.2: The shaded area is almost equal to the area of the triangle SAB. 


orbits. Kepler’s third law of planetary motion relates the period of each 

planet to the length of its major axis. To derive it, we use Kepler’s second Kepler's third law 

law: 

^ 7 rab 7 rab 2ixabm 27ra 3//2 m 1 / 2 

~~ dA/dt ~ (L/2m) ~ ^ /mK/ab ~ sjK 

where we used Equation (24.41). For gravity, K — GMm, and squaring both 
sides of the above equation gives 


rjr\ 2 


47t 2 a 3 
GM ' 


This is the mathematical statement of Kepler’s third law. 


24.6.3 The Inhomogeneous Case 

When a driving force acts on a physical system, it will appear as the inho¬ 
mogeneous term of the NOLDE. For the particular, but important, case in 
which the inhomogeneous term is a product of polynomials and exponentials, 
the solution can be found in closed form. This subsection shows how this is 
done. 

We assume that the inhomogeneous term in Equation (24.26) is of the 
form r(x) = J2kPk( x ) eXkX where pk{x) are polynomials and Xk are (complex) 
constants. The most general solution of Equation (24.26) is a linear combi¬ 
nation of a basis of solutions (as given in Box 24.6.1) of the homogeneous 
NOLDE and a particular solution of the NOLDE. We need to find the latter. 
Because L is a linear operator, it is clear that if y\ is a particular solution 
of L[y] = r\{x) and yn that of L[y] = r 2 (x), then y\ + j /2 is a solution of 
L [y\ = r\(x) +r 2 (x). This suggests breaking up the inhomogeneous term into 
smaller pieces. Thus, no generality is lost if we restrict r(x) to be p(x)e Xx 
where p{x) is a polynomial. 

The reader may verify that, for any differentiable function /, we have 
(D - X)[e Xx f(x)] = e Xx f'(x), (D - X) 2 [e Xx f(x)} = e Xx f(x), 
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and, in general, 

(D-A) VVOL)] = 

In particular, if p(x) is a polynomial of degree n, then 

(D - A ) k u = e Xx p(x) 

has a solution of the form u = e Xx q(x), where q(x) is a polynomial of degree 
n + k that is the primitive (indefinite integral) of p(x) of order k [so that the 
fcth derivative of q(x) is p(x)\. 

If v yf A, then the reader may check that 

(D - v)[e Xx f(x)} = e Xx [(X - u)f[x) + f\x)\ 

and, therefore, (D — v)u = e Xx p(x) has a solution of the form u = e Xx q(x), 
where q(x) is a polynomial of degree k. Applying the last two equations 
repeatedly leads to 


Box 24.6.2. The NOLDE L[y] = e Xx S(x), where S(x) is a polynomial, 
has the particular solution e Xx q(x), where q(x) is also a polynomial. The 
degree of q(x) equals that of S(x) unless X = Xj, a root of the characteristic 
polynomial of L, in which case the degree of q(x) exceeds that of S{x) by 
kj, the multiplicity of Xj. 


Once we know the form of the particular solution of the NOLDE, we can 
find the coefficients in the polynomial of the solution by substituting in the 
NOLDE and matching the powers on both sides. 

Example 24.6.2. We find the most general solutions of two differential equations 
subject to the boundary conditions y(0) = 0 and y'{ 0) = 1. 

(a) The first DE we want to consider is 

y" + y = xe x . (24.42) 

The characteristic polynomial is A 2 + 1 whose roots are Ai = i and A 2 = —i. Thus, 
a basis of solutions is {cos a;, sin a:}. To find the particular solution we note that A 
(the coefficient of x in the exponential part of the inhomogeneous term) is 1, which 
is neither of the roots Ai and A 2 . Thus, the particular solution is of the form q(x)e x , 
where q(x) = Ax + B is of degree 1 [same degree as that of S(x) = x\. We now 
substitute u = (Ax + B)e x in Equation (24.42) to obtain the relation 

Axe x + (2 A + B)e x + (Ax + B)e x = xe x . 

Matching the coefficients, we have 

2A = 1 and 2A + 2B = 0 =4> A = \ = -B. 

Thus, the most general solution is 

y = ci cos* + C 2 sin* + A(* — l)e x . 
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Imposing the given boundary conditions yields 0 = y( 0) = ci — 1 and 1 = y'( 0) = C 2 - 
Thus, 

y = | cos x + sin x + | (* — l)e x 

is the unique solution. 

(b) The next DE we want to consider is 

y" — y = xe x . (24.43) 

Here p( A) = A 2 — 1, and the roots are Ai = 1 and A 2 = —1. A basis of solutions 
is {e x ,e~ x j. To find a particular solution, we note that S(x) = x and A = 1 = Ai. 
Box 24.6.2 then implies that q(x) must be of degree 2 because Ai is a simple root, 
i.e., fci = 1. We, therefore, try 

q(x) = Ax 2 + Bx + C => u = (Ax 2 + Bx + C)e x . 

Taking the derivatives and substituting in Equation (24.43) yields two equations, 

4 A = 1 and A + B = 0, 

whose solution is A = —B = 1/4. Note that C is not determined, because Ce x is 
a solution of the homogeneous DE corresponding to Equation (24.43), so when L is 
applied to u, it eliminates the term Ce x . Another way of looking at the situation is 
to note that the most general solution to (24.43) is of the form 

y = cie +C 2 e +( 4 * — jx + C)e . 

The term Ce x could be absorbed in ae x . We, therefore, set C = 0, apply the 
boundary conditions, and find the unique solution 

y =s | sinha: + \(x 2 — x)e x . ■ 

The inhomogeneous DE (IDE) L[y] = r(x) can be thought of as a machine 
(or a black box) that produces a function y(x) when a function r(x) is fed 
into it. Such an interpretation is common in the study of electrical or acoustic 
filters. A signal, the function r(x), is sent into the filter, and a second function, 
y(x), is received as an output. In such a context, by far the most important 
input signal is a sinusoidal function of the general form r(t) = Acos(ujt + a), 
which, with B = Ae la , can be written in complex notation as (see Example 
18.2.3) 

r(t) = Re(R(t)) where R(t) = Be iut = Ae i(uJt+a) , 

where A, B , a, and to, the angular frequency, are all constants, and t represents 
time (the independent variable). Assuming that ico is not a root of p( A), 
the characteristic polynomial of L, Box 24.6.2 suggests a particular (complex) 
solution, U = C(u))e %u)t where C(uj) is a (cc-dependent) constant. To determine 
it, we substitute U in L[C/] = Be lut : 

L [U] = L [C(co)e iwt ] = C(w) \-[e iut ] = C(u)p(iio)e iujt , 


so that 


= Be lut =4- C(u) 


B 


L [U] = Be iut => C(u)p(iu)e iu,t 


p(iu )' 
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Writing the complex numbers in polar form 

C(u) = p{ojy^ w \ B = Ae ia , p(iu) = P(u)e ie ^ 
we obtain 


pH = 


A 


PH 


and 7 {lo) = a — 0(ui). 


The real solution, u{t) = Re[f7(f)], will then be 

(w)e*“ t ] = p{co) co 
■ cosjwf + a — 0 (w)]. 


u(t) = Re[C'(w)e*“*] = p{u>) cos[w< + 7 (w)] 
A 


PH 


(24.44) 


The function C(u) is called the transfer function associated with the lin¬ 
ear operator L. Equation (24.44) shows that the output u(t) has the same fre¬ 
quency as the input. It also indicates that the amplitude of u[t) is frequency- 
dependent, making it possible to obtain large output amplitudes by varying 
the frequency until P{u>) is minimum. This is the phenomenon of resonance 
in AC circuits. 


Example 24.6.3. Let us apply the analysis above to Example 24.6.1 and, for 
definiteness, take the underdamped case. In this case, 4b > a 2 ; and u>o = Vb 
is called the natural frequency of the system. The characteristic polynomial is 
p(\) = A 2 + aX + b. Thus, 

p(iui) = — u! 2 + itoa + b = (ujq — ui 2 ) + iuja 


and 

P(lv) = J(ujg — tu 2 ) 2 + ui 2 a 2 , 9(u) = tan -1 f ^ a 0 . 

v V^o — w / 

The amplitude of the output signal, sometimes called the gain function, is 

nli A _ _A_ - A 

PH y/H -u 2 ) 2 + 0 J 2 a 2 ' 

The minimum of the denominator occurs at u> = uio, that is, when the driving 
frequency equals the natural frequency. In such a situation we have p(oj) = A/(u> od), 
showing that the output signal will have a large amplitude when a, the damping 
coefficient, is small. 

We have considered only the particular solution, u(t), because the most general 
solution 

y(t) = Ke~ at P cos(uuf + (3) + u(t) 

in which K and /3 are constants, eventually reduces to u(t). The first term on the 
RHS, the transient term, decays to zero. The rate of this decay is determined 
by the time constant 2 /a, the time interval during which the amplitude of the 
transient term drops to 1/e of its initial value. g 

The importance of the sinusoidal signal becomes clear when we recall that 
any periodic signal can be expanded in a Fourier series, R(t) = b n e mut 
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where ui is the fundamental frequency. The linearity of L suggests the solution 
u(t) = Re[[/(t)], where 

OO 

U(t)= Y C»(w)e ta ‘. 

n=—oo 

Substituting in L[U] = R(t) gives 

OO OO 

C n (u;)p(inu>)e inut = Y ■ 

n=— oo n=—oo 

Since the e mut are orthonormal, we get C n (u>) = b n /\p(inw)], and 
u(t) = Re 


°° L a inu)t 


n— — oo 


p(inu >) 

Thus, u(<) is also periodic and has the same fundamental frequency as r(t). 


24.7 Problems 

24.1. Let / and </ be two differentiable functions that are linearly dependent. 
Show that their Wronskian vanishes. (Note that / and g need not be solutions 
of a homogeneous SOLDE.) 

24.2. Show that if (/i,/{) and ($ 2 , $ 2 ) are linearly dependent at one point, 
then fi and /2 are linearly dependent at all a < x < b. Here fi and /2 are 
solutions of the DE of (24.4). Hint: Derive the identity 

W(fi, f2\ xz) = W(fi, /b; xi) exp f p(t)dt 

\ J X\ 


24.3. Show by direct substitution that /2 of Equation (24.6) indeed satisfies 
(24.4) no matter what K is. 

24.4. Show that the solutions to the SOLDE y" + q(x)y = 0 have a constant 
Wronskian. 

24.5. Find a general integral formula for G n (x), the linearly independent 
“partner” of the Hermite polynomial H n (x) which satisfies the Hermite DE 

y" — 2 xy' + 2 ny = 0. 

Specialize this to n = 0,1. Is it possible to find Go (a:) and G\(x) in terms of 
elementary functions? 
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24.6. Use Theorem 23.3.1 to construct 


V = 


W (x) 

7Fm 



W(t ) 


a solution of 


2 ft 
h 


r 

y = 7i 


24.7. Show that each pair of the following functions satisfy the DE next to it. 
Calculate the Wronskian, and give a solution satisfying the initial conditions 
y(0) = 2 and y'{0 ) = 1. 


(a) cos x and sin x: y" + y = 0. (b) e x and e 3x -, y" + Ay' + 3y = 0. 

(c) x and e x ; y" + j^—y' ~ j—y = °- 
1 — x 1 — x 


24.8. For the HSOLDE y" + py' + qy = 0, show that 


fif-2 - hf" 
m/1,/2) 


and 


£/ £ff £f £ff 

J 1.12 J 2.1 1 

m/1,/2) 


Thus, knowing two solutions of an HSOLDE allows us to reconstruct the DE. 

24.9. Show that the HSOLDE y" + py' + qy = 0 can be cast in the form 
u" + S(x)u = 0. Hint: Define w(x) by y = wu, substitute in the DE, and 
demand that the coefficient of u' be zero to obtain 


w(x) 



Now show that the original DE can be written as u" + S(x)u = 0 with 


. w’ w" 1 9 1 , 

S(x) = q + P -1- = q- jP 2 - ■ 

W W 4 z 

24.10. Show that the adjoint of M given in Equation (24.14) is the original L. 

24.11. Show that S-L equation (24.18) can be transformed into 


■^2 + — Q(t)] v — 0 , 

by the so-called Liouville substitution, which changes both independent 
and dependent variables: 


u(x) = v(t)\p(x)w(x )] 




ds. 




Then 
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24.12. Show that 

(a) the Liouville substitution (see Problem 24.11) transforms the Bessel DE 
(. xu'Y + {k 2 x — v 2 /x)u = 0 into 


d 2 v 
dt 2 


k 2 - 


v 2 — 1/4 

T 2 


v = 0 . 


(b) Specialize to v = i and show that 

T ., . . sin kt „ cos kt 

= .4— + B—■ 

(c) Use the fact that .7^ (a:) is an analytic function of x to show that 

T ,, . ,sinkt 
J\/2{kt) = A—j=~. 

24.13. Show that the functions x r e Xx , where r = 0,1,2 ,...,k, are linearly 
independent. Hint: Starting with (D — X) k , apply powers of D — A to a linear 
combination of x r e Xx for all possible r’s. 

24.14. Suppose A is a root of the polynomial 

p(x) = x n + a n - ix" -1 +-1- a\x + ao, 


where all coefficients are real. Show that A* is also a root of p(x). Hint: 
Complex conjugate p( A) = 0. Does the same result hold if the coefficients 
were complex? 

24.15. Write Equation (24.39) in the more familiar Cartesian coordinates 
and show that e = 0 gives a circle, 0 < e < 1 gives an ellipse, e = 1 gives a 
parabola, and e > 1 gives a hyperbola. Show that except for the case of a 
parabola, the Cartesian equation of the conic section is 

(1 — e 2 ) 2 K 2 m 2 ( L 2 e \ 2 (1 — e 2 )K 2 m 2 2 

Z 3 (- 1 ' _ Km (i _ e 2) ) + £4 y = ■ 

24.16. Derive all the formulas in Equation (24.41). 

24.17. Find a basis of real solutions for each DE: 


(a) y" + by' + 6 = 0 . (b) y" + 6 y" + 12 y' + 8 y = 0 . 

(c) y (4) = y. (d) y (4) = -y. 

24.18. Solve the following DEs subject to the given initial conditions. 

(a) yU) = y, y( 0 ) = 2 /( 0 ) = 2 /"( 0 ) = 0 , y"(0) = 1 . 

(b) j/ (4) + V" = 0 , 2 /( 0 ) = 2 /"( 0 ) = y"'(0) = 0 , 2 /( 0 ) = 1 . 

(c) 2 / (4) = 0 , 2 /( 0 ) = 2/(0) = 2 /"( 0 ) = 0 , 2 /"( 0 ) = 2 . 
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24.19. Solve y" — 2y' + y = xe x subject to the initial conditions y( 0) = 
0,y'(0) = 1- 

24.20. Find the general solution of each equation: 

(a) y" = xe x . (b) y" — 4 y' + Ay — x 2 . 

(c) y" +y = sin a: sin 2x. (d) y" — y = (1 + e~ x ) 2 . 

(e) y" — y = e x sin 2x. (f) y = x 2 . 

(g) y"-4y , + 'L = e x + xe 2x . (h )y" + y = e 2x . 

24.21. Consider the Euler equation 

x n y {n) + a n _iic n_ y n_1) H-h a-yxy’ + a 0 y = r(x). 

Substitute x = e* and show that such a substitution reduces this to a DE 
with constant coefficients. In particular, solve x 2 y" — 4 xy' + 6 y = x. 

24.22. Show that v = C\ cos 9 + C 2 sin 9 can be written as v = Acos(0 — $o)- 
Find A and 9q in terms of C\ and Ci- 

24.23. ( a) Show that the extremum (maximum or minimum) of the function 

y(t) = cyte~ at/2 + c 0 e~ at/2 
occurs at t = 2/a — Co /c\. 

(b) Prove that if Ci > 0, the extremum is maximum and if C\ < 0, it is 
minimum. 

24.24. Verify that, for any differentiable function /, we have 

(D-A )[e Xx f(x)]=e Xx f'(x) 

and if v ^ A, then 

(D - v)[e Xx f{x)] = e Xx [{\ - v)f[x) + /'(*)]. 


24.25. Derive Equation (24.44). 




Chapter 25 


Laplace’s Equation: 
Cartesian Coordinates 


In Chapter 22 we discussed the technique of the separation of variables for the 
most important PDEs encountered in introductory physics and engineering 
courses. One such PDE deserving special attention is the Laplace equation 

V 2 $ = 0 (25.1) 

which shows up extensively in problems in electrostatics and steady-state heat 
conduction. The latter arises in situations in which the temperature does not 
change with time, so that the LHS of Equation (22.3) vanishes. 

Aside from its significance in applications, Laplace’s equation is important 
because its solution leads naturally to some of the most famous functions 
of mathematical physics. In fact, when separating this equation in various 
coordinate systems, one obtains not only such elementary functions as sines 
and cosines, but also the more advanced “special functions” such as Legendre 
polynomials and the Bessel functions. At the heart of such functions is the 
linearity of Laplace’s equation which allows summing a (infinite) number of 
solutions to get a new solution. This leads naturally to solutions of Laplace’s 
equation in terms of infinite series. 

In a typical situation, $ is given on some surfaces bounding a volume in 
space and its value is sought for all points in the volume. When the bounding 
surfaces are arbitrarily shaped, the solution can be found only by numerical 
techniques; but when they are primary surfaces of a coordinate system, then 
we can generally solve the problem by separating Laplace’s equation in the 
appropriate coordinate system. 
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25.1 Uniqueness of Solutions 

We shall see many examples of solutions to Laplace’s equation in various 
coordinate systems in this and the following chapters. All of these solutions 
will be obtained in the form of infinite series. So, we know that solutions to 
Laplace’s equation indeed exist. What we want to do in this section is to 
show that the solution which satisfies all the boundary conditions is unique. 
In other words, no matter how we find the solution, as long as it satisfies the 
boundary condition, it is the solution of Laplace’s equation. In fact, we can 
be more general and prove the uniqueness for the Poisson equation V 2< f> = p. 

Consider the volume V with some surfaces bounding it. Figure 25.1 shows 
two such volumes. Assume that two functions 4>i and 4>2 satisfy the Poisson 
equation at every point of the volume, and that they both satisfy some other 
conditions related to the surfaces which we shall look into shortly. Let $ = 
<!>i — <1>2 and note that <1> satisfies Laplace’s equation because 

V 2 4> = y 2 ($i - 4> 2 ) = V 2 $i - V 2 $ 2 = p-p = 0. 

For any function /, we have [see Equation (14.11)] 


V • (/V/) = V/ • V/ + /V 2 / = I V/l 2 + /V 2 /. 
For 4>—since it satisfies Laplace’s equation—we get 

V • (4>V4>) = V4> • V4> + 4> V 2 4> = | V4>| 2 . 



Integrating both sides of the last equation over the volume V and using the 
divergence theorem on the LHS yields 



e n ■ V4> 2 ) da 


S 


S 



(25.2) 


v 



V 


(a) 


(b) 


Figure 25.1: A volume (shaded region) with its bounding surface, (a) The volume is 
"inside” the bounding surface, (b) The volume is “outside” the bounding surface(s). 
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Now suppose: 


• Dirichlet boundary condition: $1 and < f >2 take on the same value at 
every point of the bounding surface(s), i.e., $1 — $2 = 0 on S; or 


Dirichlet boundary 
condition 


• Neumann boundary condition: the so-called normal derivatives 

e n ■ V$i and e„ • V ( f >2 take on the same value at every point of the 
bounding surface(s), i.e., e n ■ V<f>i — e n ■ V ( f >2 = 0 on S. 

Then, in either case, the first line of Equation (25.2) yields zero. Since the 
integrand of the RHS is never negative, the integrand must vanish . 1 It follows 
that 


Neumann 
boundary 
condition: normal 
derivatives 


| N 7 d> | 2 = 0 =>■ V<I> = 0 =>■ $ = constant => $1 — $2 = constant 

for all points in the volume V. Since $ = 0 on the bounding surface, the 
constant must be zero, i.e., $1 = <I >2 for all points in the volume V. We thus 
have 


Box 25.1.1. Let V be a volume bounded by a (possibly disconnected) sur¬ 
face S. Then there exists a unique function which satisfies both Laplace’s 
equation (or the Poisson equation) at every point ofV and either Dirichlet 
or Neumann boundary conditions on S. 


Pierre Simon de Laplace was a French mathematician and theoretical astronomer 
who was so famous in his own time that he was known as the Newton of France. His 
main interests throughout his life were celestial mechanics, the theory of probability, 
and personal advancement. 

At the age of 24 he was already deeply engaged in the detailed application of 
Newton’s law of gravitation to the solar system as a whole, in which the planets and 
their satellites are not governed by the Sun alone, but interact with one another 
in a bewildering variety of ways. Even Newton had been of the opinion that di¬ 
vine intervention would occasionally be needed to prevent this complex mechanism 
from degenerating into chaos. Laplace decided to seek reassurance elsewhere, and 
succeeded in proving that the ideal solar system of mathematics is a stable dynam¬ 
ical system that will endure unchanged for all time. This achievement was only 
one of the long series of triumphs recorded in his monumental treatise Mecanique 
Celeste (published in five volumes from 1799 to 1825), which summed up the work 
on gravitation of several generations of illustrious mathematicians. Unfortunately 
for his later reputation, he omitted all reference to the discoveries of his predecessors 
and contemporaries, and left it to be inferred that the ideas were entirely his own. 
Many anecdotes are associated with this work. One of the best known describes 
the occasion on which Napoleon tried to get a rise out of Laplace by protesting that 

1 The integral is the limit of a sum. If no term of this sum is negative, and the sum 
equals zero, then each term of the sum must be zero. 
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Pierre Simon de 
Laplace 1749-1827 


semi-infinite 
electrically 
conducting plates 


he had written a huge book on the system of the world without once mentioning 
God as the author of the universe. Laplace is supposed to have replied, “Sire, I 
had no need of that hypothesis.” The principal legacy of the Mecanique Celeste to 
later generations lay in Laplace’s wholesale development of potential theory, with its 
far-reaching implications for a dozen different branches of physical science ranging 
from gravitation and fluid mechanics to electromagnetism and atomic physics. Even 
though he lifted the idea of the potential from Lagrange without acknowledgment, 
he exploited it so extensively that ever since his time the fundamental equation of 
potential theory has been known as Laplace’s equation. 

After the French Revolution, Laplace’s political talents and greed for position 
came to full flower. His compatriots speak ironically of his “suppleness” and “versa¬ 
tility” as a politician. What this really means is that each time there was a change 
of regime (and there were many), Laplace smoothly adapted himself by changing his 
principles—back and forth between fervent republicanism and fawning royalism— 
and each time he emerged with a better job and grander titles. He has been aptly 
compared with the apocryphal Vicar of Bray in English literature, who was twice a 
Catholic and twice a Protestant. The Vicar is said to have replied as follows to the 
charge of being a turncoat: “Not so, neither, for if I changed my religion, I am sure 
I kept true to my principle, which is to live and die the Vicar of Bray.” 

To balance his faults, Laplace was always generous in giving assistance and 
encouragement to younger scientists. From time to time he helped forward in their 
careers such men as the chemist Gay-Lussac, the traveler and naturalist Humboldt, 
the physicist Poisson, and —appropriately—the young Cauchy, who was destined to 
become one of the chief architects of nineteenth-century mathematics. 


25.2 Cartesian Coordinates 


The separation of Laplace’s equation in Cartesian coordinates is obtained 
from Equation (22.12) by setting the constant C equal to zero. 2 This leads 
to the following three equations 3 


d 2 X 

dx 2 


cx\X — 0, 


d 2 Y 


a 2 Y = 0, 


cfZ 

dz 


2 + (cr i + cx 2 )Z — 0, 


(25.3) 


where the ct’s could be any number (including zero and complex). The specific 
value that each a takes on depends on the boundary conditions (BCs). We 
consider bounding surfaces parallel to the planes of the Cartesian coordinates. 

The most effective way of learning how to solve Laplace’s equation is to 
go into the details of the solution of a number of specific examples. We do 
so in the following, hoping that the reader will examine these examples very 
carefully, taking note of steps taken with an eye on how each step would 
change in a different situation (different BCs, etc). 

Example 25.2.1. Two semi-infinite conducting plates starting on the y-axis and 
parallel to the *-axis are grounded (the potential $ is zero on them) and separated by 

2 Recall from Subsection 22.2 that Tf.r. y, z) = X(x)Y(y)Z(z). 

3 We have changed the sign of the o’s to illustrate how the boundary conditions force on 
us the correct functional form of X, Y, and Z. 
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(b) 


Figure 25.2: (a) The semi-infinite plates, and (b) the cross section of the two 

(grounded) plates and the strip maintained at potential V. 


a distance b [Figure 25.2(a)]. Both plates extend from —oo to oo in the z-direction. A 
conducting strip of width b —also infinite in both directions of the z-axis—is located 
between the two plates and separated from them by an infinitesimal gap, so that 
the strip can be maintained at a different potential of $ = V. Figure 25.2(b) shows 
the cross section of the geometry of the problem. We want to find the potential in 
the region enclosed by the conductors. 

The potential is independent of z because, as a small observer moves along the 
z-axis keeping the other two coordinates fixed, his detectors and instruments will not 
detect any change in the physics of the problem, because the physical environment 
of the detectors remains unchanged. So, Z(z) is a constant which we absorb in X(x) 
or Y(y). Furthermore, substituting Z = const, in the third equation of (25.3) yields 
ai + a.2 = 0 . 

Thus the problem is reduced to finding X(x) and Y(y ) which satisfy the differ¬ 
ential equations of (25.3). First let us consider the Y equation. If a .2 = 0, then the 
solution will be of the form 

Y(y) = Ay + B. 

The case of a 2 yf 0 is a SOLDE with constant coefficients whose most general 
solution can be written as 


Y(y) =Ae' / ^ y + Be~ v ^ y . (25.4) 

The vanishing of <f> at y = 0 and y = b means that 

$(x,0) = X(x)Y(0) = 0 for all x => Y(0) = 0, 

<f?(:r, b ) = X(x)Y(b) = 0 for all x => Y(b) = 0. 

Therefore, for the case of 02 = 0, this implies 

Y(0) = Ax0 + B = 0 => B = 0, 

Y (b) =Ab + B = Ab+ 0 = 0 =>4 = 0. 

Thus, if a 2 = 0, we get Y(y) = 0 and &(x,y) = X(x)Y(y) = 0 which is the trivial 
solution. 

It follows that if we are interested in nontrivial solutions, we had better assume 
that 0:2 7 ^ 0. Then, Equation (25.4) gives 

Y(0) = A + B = 0 and Y(b) = Ae^ b + Be~^ b = 0. 


Symmetry tells us 
that the potential 
is independent 
of z. 
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Boundary 
conditions force 
02 to be 
imaginary. 


Multiplying the second equation by eand using B = —A, we obtain 
A [e 2 ^ 6 -ij=o => A = 0 or e 2 ^ 6 = 1. 


The first choice {A = 0) and A = — B yields a trivial solution again. Therefore, we 
have to assume that the second choice holds. However, even with the second choice, 
if we restrict ourselves to the real numbers, the only solution of e 2y ^ b = 1 would 
be 02 = 0 which is a contradiction because we are dealing precisely with the case 
of 02 0. It follows that y r a 2 must be a complex number. In fact, recalling that 

e 2 tmr _ ^ £ or an y jj^gggj. n5 we immediately get 


2 v / 02& = 2w7t =$. yA^b = imv =>■ 02 = — 



n = ±1, ±2,.... 


Note that n = 0 is excluded because this choice would make 02 = 0. 
We now turn to the A' equation. Since 01 + 02 = 0, we obtain 


01 = —02 = 



n = ± 1,±2,..., 


and 

d_X /rMr\ 2 Y = 0 X(x) = Ce nirx/b + De~ n * x/b . 

dx 2 V b ) 

To be physically meaningful, the potential must remain finite as x —> + 00 . It 
follows from the last equation that either n is negative and D = 0, or n is positive 
and C = 0. Either choice will lead to the same final result as the reader may verify. 
Choosing positive values of n with C = 0, the potential can be written as 

<f> n (x,y) = ADe~ nnx/b [ e in ^ /6 - e ~™y/b^ = A n e- nnx/b sin , 


where we used A = —B and introduced a new constant A n . We also subscripted the 
potential because for every n, we get a different function for $. All such functions 
are solutions of Laplace’s equation and therefore, so is their sum. In fact, it is only 
the sum that is general enough to result in the final solution. We thus write 


^(x,y) = Ane nnx/b sin ■ (25.5) 


This is a Fourier series in y with x dependent coefficients. The potential will be 
completely determined if the constants A n can be determined. This is where the 
last unused information comes in: The potential at x = 0 is V. Substituting this 
information in Equation (25.5) yields 


V = $(0 ,y) = A n sin 

n= 1 

from which A n can be determined using the Fourier series techniques. We leave it 
for the reader to show that A n = 2E[1 — (—l) n ]/(n7r) (see Problem 25.1), or 


- if n is odd, 

nir 


0 


if 


n is even. 
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By writing n = 2fc + 1 with k = 0,1, 2,..the potential in the region of interest 
becomes 


7T Z -^ 


4 y e ~( 2 k+l)irx/b 


k =0 


2k + 1 


(2fc + l)ny 


(25.6) 


4V 

7r 


— irx/b 

e ' 



+ 


e - 3 -rrx/b 3 g-S^x/b 5^ 


Because of the exponential factor, the series converges very rapidly, and for large 
values of x the potential very quickly drops to zero. Figure 25.3 shows the potential 
function (in arbitrary units) as a function of x and y. B 

Example 25.2.1 illustrates the general feature of solving Laplace’s equation 
by the separation of variables in Cartesian coordinates. This feature works in 
other coordinate systems as well. The separation of variables results in some 
ODEs which involve parameters (in the case above, the ct’s) to be determined 
by some of the BCs. All values of these parameters—which in all cases of 
interest to us will turn out to be integers—consistent with the used boundary 
conditions are allowed and must be taken into account, i.e., an infinite sum 
(with as yet undetermined coefficients) over such parameters is to be formed 
as the most general solution of Laplace’s equation. By applying the remaining 
BCs, the undetermined coefficients can be evaluated, resulting in the unique 
solution appropriate for the geometry of the problem. If the geometry extends 
to infinity in a certain direction, then such an infinity is to be considered as a 
BC. It is extremely useful to take into account any symmetry of the problem 
as such symmetries will simplify the solution considerably. The symmetry in 
the z-direction of Example 25.2.1 saved us the trouble of solving one (out of 
three) complete ODE. 



Figure 25.3: The potential function inside the semi-infinite box of Figure 25.2 when 
only 20 terms of the infinite series are kept. Note how quickly the potential drops to 
zero along the *-axis due to the exponential factor. 
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infinitely long 
rectangular heat 
conductor 


Example 25.2.2. Steady-state heat conduction problems also obey Laplace’s 
equation. So, let us consider a rectangular medium infinite in the z direction en¬ 
closed by two pairs of parallel slabs of width a and b as shown in Figure 25.4. The 
temperatures of the slabs of width a —assumed parallel to the a;-axis—are zero. The 
temperatures of the other two slabs are Ti and Ti. We want to find the temperature 
at all points in the region enclosed after the equilibrium is reached. 

As in Example 25.2.1, we can ignore the z-dependence and write T(x,y) = 
X(x)Y(y) where X and Y satisfy Equation (25.3) with ol\ = —a 2 - For exactly the 
same reason as in Example 25.2.1, «2 cannot be zero and Y can only be of the form 

Y(y) = A n sin^p, n = 1,2,..., 

where the subscript on A n reminds us that different constants can be chosen to 
multiply different sine functions. The solution for X will, however, be different. We 
still have 

X(x) = C n e n ™ /b + D n e~ nnx/b , n = 1,2,..., 
but neither C n nor D n is zero this time. Multiplying the two functions and redefining 
the constants, we can write 

T n (x,y) = (A n e n ™ /b + B n e~ n ™ /b ) sin ^ 
and the most general infinite series solution becomes 

OO 

T(x, y) = J2 ( A n e n7rx/b + B n e ~ n7rx/b ) sin ^. (25.7) 

n= 1 

So far, we have used only two of the four BCs. The remaining two will deter¬ 
mine the unknowns A n and B n . Substituting these BCs yields the following two 
equations: 

OO OO 

Tr = T(0, y) = Y, ( A n + B n ) An ^ = £ E n sin 

n=l v n=1 

=E n 

OO OO 

rri rr-i( \ \ " V / a nira/b . —nira/b\ • TLTTy \ ^ j--, . TITVy 

T 2 = T(a, y) = 2_^ yA n e 1 + B n e ' j sin —— = } J F n sin ——, 

n=l s ^ s n=l 

= F n 



Figure 25.4: The cross section of the two pairs of parallel slabs maintained at different 
temperatures. 
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where we have redefined the constants multiplying the sine functions. As in Example 
25.2.1 and Problem 25.1, we have 

En = ^i i-(-in, F„ = ^[i-(-i) n ]. 

m r nn 

These relations show that only odd terms of the infinite series are of relevance, and 
they are given by 


E 2k+1 

E 2k +i 


A 2 k+i + B 2fc+i 


47j 

7r(2 k + 1) ’ 


A 2k+1 e (2k+1) ™ /h + B 2k+1 e- (2k+1) ™ /b 


4 T 2 

7t(2 k + 1) 


(25.8) 


These are two equations in two unknowns which can be solved to get 

2(T 2 — T\e~^ 2k+1 ' >na ^ b ) 


A 2k +i = 
B 2k +i = 


7r(2 k + 1) sinh[(2fc + l)7ra/6] ’ 

2(T\e^ 2k+r> ’ Ra ^ b — To) 

7r(2 k + 1) sinh[(2fc + l)7ra/6] 


Substituting in Equation (25.7)—with n replaced by 2k + 1—and rearranging terms 
yields 


T{x,y) 


4 

7T 


oo 


E 


Ti sinh | 

(2fc+l)7r(a —x)"| 

6 J 

+ T 2 sinh | 

(2fc + l)7raj "| 

. 6 J 

(2k + 1) sinh | 

(2fc+l)-7ra ] 

. 6 J 



sin 


(2k + l)n y 
b 


(25.9) 


The reader is urged to verify that when T 2 = 0 and a —> oo, we recover the result 
of Example 25.2.1 —with V replaced by T \—as we should. Figure 25.5 shows the 
potential function (in arbitrary units) as a function of x and y. ■ 


The examples treated so far may give the impression that cti or a.i is 
never zero. This has to do with the specific BCs imposed on <!> (or T ). In 



Figure 25.5: The potential function inside the box of Figure 25.4 for the special case 
of a = b and Ti = T 2 when only 20 terms of the infinite series are kept. 
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both examples, Y vanishes at both y = 0 and y = b. Such a BC excludes 
02 = 0 because the corresponding Y, namely Y = Ay + B, cannot satisfy 
those conditions unless Y = 0 identically. 

Example 25.2.3. To see how the on = —«2 = 0 terms can enter in the game, let 
us modify the temperatures of the plates and strips of Example 25.2.2 so that the 
bottom plate and the left strip are held at T = 0 while the top plate is held at T\ 
and the right strip at T^. 

Let us write the most general solution of Laplace’s equation obtained from sep¬ 
arating variables including the a\ = 0 = — 02 term. Since the nonzero qi and 
02 are of opposite signs, one of them will be positive and will have real square 
roots and the other pure imaginary roots. Let us assume that oi is positive. 
Then X will be of exponential type and Y of imaginary exponential or trigono¬ 
metric type. It follows that the most general solution of Laplace’s equation can be 
written as 

T(x,y) = (A 0 x + Bo)(C 0 y + D 0 ) (25.10) 

OO 

+ £ (^A a e' / '* x + B a e~^ x ^ [C a sm(y/ay) + D a cos(y / ar/)] , 

a 

where we have used a for a\ = — « 2 - It is convenient to impose the y BCs first. So, 
since T(x, 0) = 0, we have 

OO 

0 = ( A 0 x + Bo) Do + J2 (A a e^ x + D a 

a 

which should hold for arbitrary values of x. This can happen only if Do = D a = 0. 
So, absorbing the multiplicative constant Co and C a into the ,4’s and B’s, we get a 
new expression for the temperature: 


T(x,y) = (A 0 x + B 0 ) y + ^ (^A a e' /Ex + B a e ^^ sin (\/ay). 


The other y BC gives 


Ti = ( Aqx + B 0 ) b + ^2 + B a e sin(^a6). 

ot 


The importance of 
ai = 0 term is 
displayed here by 
the relation 
between Bo and 
Ti. 


For this to hold for arbitrary x, we need to have 

A 0 = 0, Bob = Ti =* Bo = y, sin(V^b) = 0 => a = ( — J . 
The temperature function reduces to 

rp OO 

at,/ \ -t 1 V ' / . mrx/b . r, —n7rx/b\ • Tl7Vy 

T(x, y) = —y + 2^ [A n e ’ + B n e ' j sm —. 


We now impose the other two BCs. These will give us the following two equations: 


0 = T(0, y) = y y + {A n + B n ) sin ^, 

71=1 

rj -, OO _ 

m rrii \ -L1 i \ ^ ( a mra/b , o —nira/b\ • TlTTy 

T 2 =T(a,y) =—y + 2_, yAn e ' + B n e ' j sin—. 
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Multiplying both sides of these equations by sm(mny/b) and integrating from 0 to 
b yields the following two equations for A m and B m : 


A m + B n 


mira/b , o 

i 


— rmva/b 



(25.11) 


= — [T 2 + (-l) m (Ti -T 2 )]. 


These two equations can be solved to obtain the remaining unknown coefficients A m 
and Bm. ■ 


A couple of remarks are in order. The preceding example illustrated clearly 
the importance of the aq = 0 term: Had we not included it in the expansion 
of T, we would not have obtained the answer. This is overlooked in most 
elementary treatments of Laplace’s equation. It is worthwhile to emphasize 
this point. 


Box 25.2.1. Always start with the most general solution of Laplace’s 
equation, including the term corresponding to the case in which the con¬ 
stants of the separation of variables are zero, as given in Equation (25.10). 
Then apply the BCs, keeping in mind that there may be a preferred order 
for such an application. 


In Example 25.2.3, the order in which the y BCs were applied first was the 
preferred choice. 

The second remark has to do with the choice of the functional form of 
X and Y. In Example 25.2.3, we chose X to be exponential and Y to be 
trigonometric. We could just as well have chosen Y to be exponential and X 
to be trigonometric. The appearance of the series would have changed, but 
the value of T at any point in the region of interest would have been the same 
for both series. This is due to the uniqueness of the solution of Laplace’s 
equation. 4 

Example 25.2.4. The examples treated so far have been exclusively in two di¬ 
mensions. We now consider a three-dimensional problem. Although this particular 
problem can be solved more quickly by relying on our intuition (as we did in Exam¬ 
ple 25.2.1, for example), we shall start from the most general solution, as prescribed 
by Box 25.2.1. 

Suppose that the four lateral sides of widths a and b of a semi-infinite rectangular 
conducting tube are grounded and the closed base is held at potential V. The cross 
section of this tube is shown in Figure 25.4 where it is assumed that the tube starts 
at z = 0 and extends to infinity in the positive ^-direction. We are interested in 
finding the potential inside this tube. 

4 The representation of the same function by different series should be familiar to the 
reader from calculus where f(x) can be written as a Taylor expansion about any point in 
its domain of definition. Although such expansions look different, they all represent the 
same function. 


a three- 
dimensional 
example of the 
application of 
Laplace's equation 
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the four 
alternatives for 
constants a\ and 

0.2 


Boundary 
conditions severely 
restrict the terms 
of the infinite 
sums in (25.12). 


We start with Equation (25.3) which holds for all solutions of Laplace’s equation 
in Cartesian coordinates. There are four different cases to consider: 

1. on = 0 = ot 2 '- In this case, X(x) is of the generic form Ax + B, and with y or 
z replacing x, this is also the generic form of Y and Z. Let us denote these 
solutions as Xo, Yo, and Zq. 

2. on = 0, «2 / 0: In this case, X(x) is of the generic form Ax + B. But Y and 
Z are either exponential or trigonometric. Let us denote these solutions as 
Xo, Iq 2 , and Zq. 2 . 

3. ai / 0, 02 = 0: In this case, Y(y) is of the generic form Ay + B. But A' and 
Z are either exponential or trigonometric. Let us denote these solutions as 
Do, X ai , and Za^. 

4. ai / 0, 02 / 0: In this case X, Y, and Z are either exponential or trigono¬ 
metric. Let us denote these solutions as X ai , Dt* 2 , and Z ai +a 2 - 

The most general solution for the potential, encompassing all values of «i and 0 : 2 , is 

4>(x,y,z) = Xo(x)Y 0 (y)Z 0 (z) + Xo(x)^2Y a2 (y)Z a2 (z) (25.12) 

OL 2 

+ ^o (y)J2 1 (x)Zoc 1 ( z ) + E E X ai (x)y a2 (y)Za 1 +a 2 (z). 

a 1^0 O'27^0 

We now apply the BCs. Since $(0, y, z) = &(a,y,z) = 0 for arbitrary y and z, 
and since each term in Equation (25.12) is independent of all others, we conclude 
that A'o(0) = 0 = A'o(a) and Xo(0) = 0 = Xo(a). It follows that A and B are both 
zero for Xo and Xo- So, Xo(x) = 0 = Xo(x). Similarly, Do (y) = 0, and 'L is reduced 
to the last term (the double sum) of (25.12). Furthermore, since both X ai and Da 2 
vanish at the two ends of their respective ranges, we expect them to be periodic, 
i.e., of trigonometric type. So, the most general solution is now 

<&(*, y, z) = ^2 [A ai 003 ( 7/01 x) + B ai sin( x /oTa:)] 

o=l ,a<2 

■ [C a2 008 ( 7/02 y) + Da 2 sin {y/ 0 C 2 y)] z ai +a 2 (z). 


If this is to vanish at * = 0 for arbitrary y and z, then A ai must be zero; and if 
$(a,i/, z) = 0 for all y and z, then all coefficients of the product of the y and z 
functions in the sum must be zero. These coefficients—after setting A ai equal to 
zero—are of the form sin(y / oTa). It follows that 


y/ai a = rm r => on = 



m= 1,2,..., 


where we have excluded the negative values of m as in the previous examples. An 
entirely analogous reasoning leads to C a2 = 0 and 


y/ct 2 b = rnv => 02 = 



n = 1 , 2 ,.... 


The ^-dependence is exponential, and since the potential cannot diverge at large 
values of z, the positive exponent will be absent. Absorbing all multiplicative con¬ 
stants into (a single doubly indexed) one, we can now write 


§(x,y,z)= ^ A ■■ 


. mnx . niry _ 
mn sin-sin —— e 


t y/m 2 /a 2 -\-n 2 /b 2 ; 


m,n= 1 


(25.13) 
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The unknown constants A mn are determined by using the last BC: 

OO 


V = $(x,y,0) 


y Amn sin 

m,n= 1 


rmvx . nny 

-sm ———. 

a b 


(25.14) 


a two-dimensional 
Fourier series 


This is a double Fourier series. 


Theorem 25.2.5. The coefficients of the double Fourier series (25.If) can be 

jttx kTvy 

calculated by multiplying both sides by sin -— sin — ; — and integrating the result 

a b 


from 0 to a in the x variable and from 0 to b in y: 



Ff 


_ , . jnx . kny , 

aha:, y, 0) sm-sm ——ax ay 

a b 


It now follows that 


Ajk 


4V 

ab 



■ jnx . kny 

sm-sm ——dx dy 

a b y 


4V 

ab 


f a ■ jnx f . 
/ sm- dx / si 

Jo a Jo 


kny 

sm ——dy 


4V 

ab 


nj 




or 

4V l-(-lV !-(-!)* 
n 2 j k 

It is clear that only the odd terms of the double sum will contribute. Thus, the final 
answer for the potential inside [Equation (25.13)] is 


x 16V 

Hx,y,z) = — 2^ 

m,n=l 


sin[(2m + l)nx/a\ sin[(2n + l)ny/b\ 
2m + 1 2n + 1 


^ — 7r^/(2m+l) 2 /a 2 -|-(2n-t-l) 2 /t 2 z 


By its very construction, this function satisfies Laplace’s equation as well as all the 
BCs. Therefore, by the uniqueness theorem it must represent the unique potential 
for the region of interest. ■ 


25.3 Problems 

25.1. Given that V = YlFFi A n sin(mry/b) where V is a constant in the 
interval (0,6), show that A n = 2V[1 — (—l) n ]/(?t7r). 

25.2. A long hollow cylinder with square cross section of side a has three sides 
grounded and the fourth side maintained at potential Vg (see Figure 25.6). 
Find the potential at all points inside. 

25.3. Example 25.2.1 treated the case in which the plate at x = 0 was held at 
the constant potential V. Now suppose that it is held at a potential that varies 
with y. Use Equation (25.5) to find the potential as a function of x and y when 
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Figure 25.6: The cross section of the conducting cylinder extended along the 2 -axis. 

(a) <F(0 ,y) = j^y(y-b) 

(b) <f>(0 ,y) = yy 

(c) <3>(0, y) = Vq sin y. 

25.4. In Example 25.2.2, we assumed constant temperatures for the left and 
right plates. Now suppose that the top and bottom plates are as before, but 
the left plate is held at a varying temperature given by 

T(0, y) = yy(y ~ b). 

Use Equation (25.7) to find the temperature as a function of x and y when 

(a) T(a,y) = 0; 

(b) T(a,y) = yy(y - b); 

(c) T(a,y) = Tq', 

(d) T(a,y) = y y; 

(e) T(a,y) = T 0 sin . 

b 

25.5. Suppose that the top and bottom plates of Example 25.2.2 are as before, 
but the left plate is held at a varying temperature given by 

T(0,y) = T 0 sin . 

Use Equation (25.7) to find the temperature as a function of x and y when 

(a) T(a,y) = 0; 

(b) T(a,y) = y y{y - 6); 

(c) T(a,y) = T 0 ; 
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(d) T(a,y) = yy; 

(e) T(a, y) = T 0 sin y. 

25.6. Solve Equation (25.8) for A 2 k+ i and B 2 k+i and substitute in (25.7) to 
obtain (25.9). 

25.7. Verify that when T 2 = 0 and a —> oo, Equation (25.9) approaches the 
result of Example 25.2.1—with V replaced by Tf. 

25.8. Derive Equation (25.11). Assume that T\ =T 2 = To and solve for A m 
and B m . 

25.9. Obtain the expression for Ajk in Theorem 25.2.5. 

25.10. Find the potential inside a cube with sides of length a when the top 
side is held at a constant potential Vo with all other sides grounded (zero 
potential). 

25.11. Find the electrostatic potential inside a cube with sides of length a if 
all faces are grounded except the top, which is held at a potential given by: 

(a) —x, 0 < x < a. 

(c) -%xy, 0 < x,y < a. 

a z 


(b) —y, 0 < y < a. 

(d) Vq sin (—x\ , 0 < x < a. 




Chapter 26 


Laplace’s Equation: 
Spherical Coordinates 


The separation of Laplace’s equation in spherical coordinates is obtained from 
Equation (22.16) by substituting /(r) = 0. This will yield 1 


1 d 
sin 9 cL9 


LA. 

r 2 dr 



dR\ a 


dr 


-■^R = 0, 


P 


sin 2 9 
d 2 S 


0 = 0 , 


dip 2 


+ ps = 0. 


(26.1) 


We consider the case where S is the constant function. 2 This corresponds 
to problems with an azimuthal symmetry, i.e., problems for which it is a 
priori clear that the potential is independent of the azimuthal angle p. For 
such situations, the third equation in (26.1) implies that ft = 0 because S' is a 
(nonzero) constant. The independent variables are reduced to two and, with 
$(r, 9) = R(r)Q(9), the remaining ODEs simplify to 


azimuthal 
symmetry means 
independence 
from p 


LL f r 2 LA] 

r 2 dr \ dr J 


-■^R = 0, 


1 d 
sin 9 d9 



+ a© = 0. 


(26.2) 


We shall now concentrate on the second equation and come back to the first 
after we have found solutions to the second. 

1 Here we have changed the symbol of the azimuthal function to S so that 'ffr, 0. if ) = 

R(r)e(6)S(p). 

2 The case in which S is not constant—so that $ depends on the azimuthal angle—is 
more complicated and will not be pursued here. Instead, the interested reader is referred 
to Hassani, S. Mathematical Physics: A Modern Introduction to Its Foundations , Springer- 
Verlag, 1999, Chapter 12 for details. 
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Legendre 

differential 

equation 


The appearance of sin Odd (the differential of cos0) in the denominator 
suggests changing the independent variable from d to u = cos d. For any 
function f(6), the chain rule gives 3 


df _ df dd _df i _ \_dj_ or # = _. n # 

du dd du dOdu/dd sin Odd dd du 


which allows us to convert the derivative of a function with respect to u to 
the derivative with respect to d and vice versa. 

Introduce a new function P(u) such that P(u) = 0(d). Using the chain 
rule, substituting in the second equation of (26.2), and writing sin 2 d = 1 — u 2 , 
the DE becomes 

1 d 
sin 0 dd 




+ aP = 0. 


The term in the square brackets is a function of u. So, by Equation (26.3), 
we can convert the 0-derivative to a u-derivative and obtain 


d_ 

du 


(1 - u 2 ) 


dP_ 

du 


+ aP = 0, 


(26.4) 


which can also be written as 


d 2 P 


dP 


(1 — it 2 )—y — 2 u— -1- aP = 0 

du du 


(26.5) 


or 


d 2 P 2 u dP a 

du 2 1 — u 2 du 1 — u 2 


(26.6) 


Equation (26.4), or (26.5), or (26.6) is called the Legendre equation. We 
shall solve this DE using the so-called Frobenius method or the method of 
undetermined coefficients. 


26.1 Frobenius Method 

The basic assumption of the Frobenius method is that the solution of the DE 
can be represented by a power series. This is not a restrictive assumption 
because all functions encountered in physical applications can be written as 
power series as long as we are interested in their values lying in their interval 
of convergence. This interval may be very small or it may cover the entire 
real line. 

A second order homogeneous linear DE can be written as 

P 2 (x)^+pi(x)^-+p 0 (x)y = 0. (26.7) 


3 Note that / can be considered a function of u as well as 6. 
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For almost all applications encountered in physics (certainly in this book), 

Po, pi, and pi are polynomials. 4 The first step in the implementation of the 
Frobenius method is to assume an infinite power series for y. It is common 
to choose the point of expansion to be x = 0. If P2(0) yf 0, only nonnegative 
powers of x need be considered. 5 If P2(0) = 0, the DE loses its character of 
being “second order” and the solutions we are seeking may not be defined 
there. In such a case, we have two choices: 

1. choose a different point of expansion xo yf 0 so that P 2 (xo) yf 0; or 

2. allow nonpositive powers of x in the expansion of y. 

The first choice is rarely used. It turns out that the most economic—but 
general—way of incorporating the second choice is to write the solution as 

OO OO 

y = x r ^ a n x n = ^2 a n x n+r = aox r + aix r+1 + a 2 X r+2 + azx r+3 + • • • , 

n=0 n—0 

(26.8) 

where r is a real number (not necessarily a positive integer) to be determined 
by the DE. 6 It is customary to choose ao = 1 because any constant multiple 
of a solution is also a solution; so, if ao y^ 1, we simply multiply the series by 
1/ao to make it so. 7 Since a power series is uniformly convergent—within its 
radius of convergence—it can be differentiated term by term. So, we have 

OO 

^ a n (n + r)x n+r ~ 1 = ra 0 x r ~ 1 + (r + l)aia: r + • • • , 

n=0 
oo 

^2 a n( n +r)(n +r — l)x n+r ~ 1 (26.9) 

n=0 

r(r — l)aox r ~ 2 + (r + l)ra\X r ~ 1 + (r + 2)(r + l)a, 2 X r H-. 

We now substitute Equations (26.8) and (26.9) in the DE (26.7), multiply 

out the polynomials into the series, collect all distinct powers of x together, 

and set the coefficient of each term equal to zero. We thus obtain a set of 

equations whose solution determines r and the a n ’s. The equation arising form 

the lowest power of x involves only r and is called the indicial equation, indicia I equation 

This is usually a quadratic equation in r which can be solved to obtain the 

4 The DE may not emerge in the form given here out of, say, the separation of variables, 
but can be cast in that form. The most complicated form of the coefficients of the derivatives 
in a DE are typically rational functions (ratios of two polynomials). Therefore, multiplying 
the DE by the product of all three denominators will cast the DE in the form given in 
(26.7). 

5 For a thorough discussion of the Frobenius method, including motivation and proofs for 
the claims cited here, consult Hassani, S. Mathematical Physics: A Modern Introduction 
to Its Foundations , Springer-Verlag, 1999, Chapter 14. 

6 As Problem 26.2 indicates, one can start with a solution of the form (26.8) even when 
P 2 (0) 7 ^ 0. The differential equation will then force r to be zero. 

7 The choice ao = 1 is convenient only when P 2 ( 0 ) = 0. If P 2 ( 0 ) 7 ^ 0, we need not 
restrict ao- 


dy_ 

dx 

fy 

dx 2 
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recursion relations 


(two) possible values of r, each leading generally to a different solution. The 
other equations coming from higher powers of x give recursion relations, 
i.e., equations which give a n in terms of a n -\ and a n - 2 - By iterating this 
relation, one can obtain all a n ’s in terms of only two which can be determined 
by the BCs. Let us summarize the procedure outlined above: 

Theorem 26.1.1. ( Frobenius method). To solve the DE (26.7), assume 
a solution of the form (26.8). If £> 2 ( 0 ) ^ 0, choose r = 0, substitute y and 
its derivatives (26.9) in the DE, multiply out, collect all powers of x, and set 
their coefficients equal to zero. If £> 2 ( 0 ) = 0, let ao = 1 and solve the indicial 
equation to obtain r. Set the coefficients of all other powers of x equal to zero 
to find the recursion relation giving a n in terms of a n -1 and a n - 2 . Use this 
relation and the values of r obtained above to find all a n ’s in terms of only 
two. 


26.2 Legendre Polynomials 


We now apply the Frobenius method to the Legendre DE for which—using u 
as the independent variable— p 2 (u) = 1 — it 2 , pi(u) = —2u, and po(u) = a. 
Since £> 2 ( 0 ) y 0, we need not introduce an extra power of r for the series. 
Therefore, we may write 


OO 

P(it) = a n u n = ao + apu + a 2 u 2 + a^u 3 + • • • , 

n=0 


dP_ 

du 

d 2 P 
du 2 


OO OO 

na n u n ~ l = ai + 2 o 2 it + 2>a^u 2 + • • • = + l)a n +i u n , 

n= 1 n =0 

00 

n(n — l)a n u n ~ 2 = 2 a 2 + 60311 + 12 o 4 it 2 + • • • 

n =2 
00 

y^(n + l)(n + 2 )a n+ 2 u n . 

n =0 


Multiplying each of the expressions above by its corresponding polynomial, 
we obtain 


OO 

aP(u) = aao + aa\u + aa 2 u 2 + aa^u 3 + ■ ■ ■ = aa n u n , 

n—0 


dP 2 o 

—2u—— = — 2 aiit — 4a 2 u — 603 u + • 
du 


Y 2(n + l)a n+ m n+t , 

n=0 


P 


2a 2 + 60311 + 12o 4 m 2 + 20 o 5 it 3 H- 

— 2a 2 u 2 — 6a3ii 3 — 12o4it 4 — 20asit 5 H- 

2 a 2 + 60311 + ( 12 o 4 — 2a 2 )u 2 + (2605 — 6 o 3 )u 3 + • • • . 




26.2 Legendre Polynomials 


611 


We add these three series, noting that their sum must equal zero 


0 = crao + aaru + aa 2 ii 2 + aa^u 3 — 2 a\u — 4 a 2 u 2 — 6C13U 3 + 2 a 2 + 603 u 
T (1204 — 2 tt 2 )'U 2 T (2005 — Qa^u 3 T * • • , 

= (oao + +2a 2 ) + [(a — 2)ai + 6a 3 ]w 
T [(ct — 6)n 2 T 12o4]w 2 T [(ct — 12)03 T 20cl§\u 3 H - * * * . 


The reader may note the pattern emerging in the expression for the coeffi¬ 
cients. In fact, the coefficient of u n can be written as [a — n(n + l)]a„ + (n + 
l)(n + 2 )a n + 2 - Setting this coefficient equal to zero, we obtain the recursion 
relation 

n(n + 1 ) — ol , 

a n +2 = , 1V n= 0 , 1 , 2 ,..., (26.10) 

(n + l)(n+ 2 ) 

which gives all a n ’s in terms of a 0 and ai. 

Although writing out the series term by term is a sure way of arriving at 
each individual coefficient, and by the discovery of a pattern—the recursion 
relation, manipulation with the summation symbols can also lead to the recur¬ 
sion relations without any expectation of pattern recognition. We go through 
the details of such a manipulation as a noteworthy exercise in working with 
the summation signs. The general procedure is to write all sums in such a 
way that the exponent of u agrees in all of them. To be specific, we write 
all sums over n so that the power of u is n. This may require redefining the 
summation index. So, the last term of the DE can be expressed as 


recursion relation 
for the Legendre 
equation 


How to get to the 
recursion relation 
(26.10) by 
manipulating 
summations. 


xP(u) = aa n u n . 

n =0 


The term involving the first derivative is 


dP 


— 2 tt—- = — 2 ttV^ na n u n 1 = — 2na r , 
du 


n —1 


n—1 


(26.11) 


(26.12) 


and the term involving the second derivative becomes 

OO 

(1 — u 2 ) ^ n(n — 1 )a n u n ~ 2 

71=2 

00 00 

n(n — 1 )a n u n ~ 2 — ^ n(n — 1 )a n u n (26.13) 

n= 2 n= 2 

00 OO 

y^(n + 2 )(n + 1 )a n+2 u n — ^ n(n — l)a n ti ra , 

71=0 71=2 

where in the first sum we replaced n with n + 2 to change the power of u from 
n — 2 to n. 8 The power of u in all the sums is now n. 

8 The phrase “in the first sum, we replaced n with n+2” is an abbreviation for a procedure 
whereby first a new dummy index m is defined by m = n — 2 (or n = m + 2), and then it 
is changed back to n. 


d 1 P 
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The next step is to separate a sufficient number of terms of the “longer 
sums” so that all sums start with the same n corresponding to the shortest 
sum. In the case at hand, the shortest sum is the one that starts with n = 2. 
So, rewrite the sums in Equations (26.11), (26.12), and (26.13) as 

OO 

aao + aaiu + aa n u n , 

71=2 
OO 

— 2 aiw — 2 na n u n , 

71=2 

OO OO 

2 a 2 + 60371 + ^^(?r + 2 )(n + l)a n + 2 U n — n(n — 1 )a n u n . 

71=2 71=2 

Adding these sums and noting that the LHS is zero gives 


aP{u ) = 


„ dP 
—2 u—— = 
du 


(1 -u 2 ) 


d 2 P 

du 2 


0 = aao + 2(i2 + (aoi — 2ai + 603)77 

OO 

+ y [aa n — 2 na n + (n + 2 )(n + l)a n + 2 — n(n — l)a„] u n . 

n=2 v 

= [a-n(n+l)]a„+(n+2)(n+l)d„ + 2 

By setting the coefficients of all powers of u equal to zero, we obtain 

aao + 2a 2 = 0, (a — 2)ai + 603 = 0, 

[a - n(n + l)]a„ + (n + 2)(n + l)a „ +2 = 0 , 


with the first two being special cases of the last one, which in turn happens 
to be the recursion relation (26.10). 

Equation (26.10) is at the heart of the solution to the Legendre DE. It 
generates all the a n ’s with even n from a 0 , and all the odd a n ’s from ai. We 
derive a general formula for even a n ’ s, and leave the odd case to the reader. 
For n = 0, Equation (26.10) gives a 2 = —(a/2)ao, and for n = 2 we obtain 


04 = 


2-3 — a 2-3 — af a\ a(a — 2 • 3) 


4-3 

Similarly, for n = 4, we get 


a 2 = 


4-3 


M) 


4! 


-a 0 . 


o 6 = 


4 • 5 — a 
6 • 5 


-04 = 


y(a — 2 • 3) (a — 4 • 5) 


6 ! 


-a o- 


The reader may easily check that 


a(a — 2 • 3)(a — 4 • 5)(a -6-7) 

«8 —-gj-ao- 

All these equations show a pattern that can be generalized to 

7 , a(a - 2 • 3)(a - 4 • 5) • • • [a - (2n - 2)(2n - 1)] 
a 2n = (-1) -(2njl- a °- 


(26.14) 
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Similarly, the odd terms can be calculated with the result 


.„(£* — 2)(a — 3-4)(a — 5 • 6) • • • [a — (2n-l)2n] 
a2n+i — (-J-) -(2rT+T)!- ai ' 

Inserting these coefficients in the series expansion of P(u), we obtain 

^ a{a — 2 • 3)(a — 4 • 5) • • • [a — (2n — 2)(2n — 1)] 2n 
— L ) c2nV 

n—0 V '* 

, ( a — 2 )( a — 3 • 4)(a — 5 • 6) • • • [a — (2n — l)2n] 2n+1 

+ ai ^- 1 j (2 n+l)! ' 

n=0 

(26.16) 


If either of the series in Equation (26.16) is to have a physical utility, it 
must be convergent. The appearance of (—1)" may lead us to believe that 
the series is alternating. This is not true, because the terms involving a could 
be positive as well as negative. So, we cannot use the alternating series test. 
Let us use the ratio test. We apply this ratio test to the even series, the odd 
series calculation is identical. Calling the entire nth term of the series c n , we 
have 


lim 

Cn+1 

= lim 

n „.2n+2 

02 n+2M 

= lim 

2n(2n + 1) — a 

n—too 

Cn 

n—> oo 

a 2n u 2n 

n —too 

(2n + l)(2n + 2) 


So, when u 2 < 1, the series converges. Recall that u = cos 9, and 9 = 0 , 7 t 
are points of physical interest corresponding to u = ±1. Therefore, the series 
ought to converge there. In this case we cannot decide about the convergence 
of the series based on the ratio test. Let us apply the generalized ratio test. 
Then, for very large n, we have 


Cn +1 


2n(2n + 1) — a 


n a 

Cn 

u 2 — 1 

(2n + l)(2n + 2) 


n + 1 (2n + l)(2n + 2) 


and the generalized ratio test implies divergence for the series! This conclusion 
holds for both the even and odd series of (26.16). 

There is a way of making the series convergent. Recall that the parameter 
a is completely arbitrary. In particular, we can—if it is helpful—put restric¬ 
tions on it. Can we choose a in such a way that the series converges? We 
note that as long as the series is infinite, we have no luck because we get back 
to the generalized ratio test and divergence. However, if we choose a so that 
all a n ’s after a certain finite number of terms vanish, then the series turns 
into a finite sum, and P(u ) becomes a polynomial for which the question of 
convergence is irrelevant. So, let us assume that ao, a 2 , and all the other o’s 
up to 02 k can have nonzero values, but all the remaining coefficients are to 


The generalized 
ratio test shows 
that either of the 
series in Equation 
(26.16) diverges. 


To make the series 
convergent, 
truncate it into a 
finite sum! 
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be zero. All we need to do is to choose a so that a 2 k +2 vanishes; then the 
recursion relation guarantees the vanishing of U 2 k+ 4 , a 2 fc+ 6 > etc. Since 

2k(2k + 1) — a 
° 2fc+2 = (2k + l)(2fc + 2 ) a2k ' 

we must choose a = 2k(2k +1). A similar argument yields a = (2k — 1)2 k 
for the odd series. 

Choosing a to turn one of the infinite sums into a finite polynomial is only 
a partial solution to the problem. Is it possible to choose a so that both the 
odd and the even series are truncated after a finite number of terms? Suppose 
we have chosen a to be 2k(2k + l), so that the even series has no term beyond 
the (2fc)th term. The recursion relation for the odd series can be written as 


(2 n 

fl2n+l — - 


l)2n — 2k(2k + 1) 
(2 n + 1)2 n 


0271-1- 


Setting the numerator equal to zero gives a quadratic equation which can be 
solved for n to obtain 


n = — k or n = fc+|. 

Neither of these is a positive integer ! Thus, the value of a chosen to truncate 
the even series does not allow the truncation of the odd series. To avoid 
this dilemma, we resort to a choice of another arbitrary constant, a\. By 
setting oi equal to zero, we completely avoid the odd series. Conversely, if 
a = (2k — 1)2 k —chosen to truncate the odd series—then ao ought to be set 
equal to zero. By convention, ao and ai are determined so that P(l) = 1. Let 
us summarize our findings: 

Theorem 26.2.1. A solution to the Legendre DE (26.5) exists only if a = 
k(k + 1) where k is a nonnegative integer. The corresponding solution is 
denoted by Pk(u) and is a polynomial of degree k, called the kth Legendre 
polynomial, which has only even powers of u if k is even and odd powers of 
u if k is odd. By convention Pfc(l) = 1 for all k. 

Thus, for each k we have a different solution, and a different a o or a\ to 
evaluate. That is why it is more appropriate to write Ck for either ao or ai. 
We can use either (26.14) and (26.15), or the recursion relation (26.10) to find 
the coefficients of each polynomial. 

calculation of the 
first five Legendre 
polynomials using 
the recursion 
relation 


Example 26.2.2. We calculate the Legendre polynomials up to order 4 using the 
recursion relation. Po is of degree zero, so it is a constant and -Pfc(l) = 1 forces it 
to be 1. So, Po(u) = 1. Since, -Pi(it) is of degree 1 with no even “powers” of u, it 
can be only of the form C\u where Ci is a constant. But -Pi(l) = I; so C\ = 1 and 
Pi (it) = it. For P 2 , a = 2 • 3 = 6, and the recursion relation gives 


02 - ——ao — — 3 C 2 =>■ P 2 (u) = C 2 — 3 C 2 M 2 
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because P 2 (u) has no u term. For P 2 (l) to be equal to 1, we must have C 2 = — 1, 
so that P 2 {u) = |(3 u 2 — 1). For P 3 , a = 3 • 4 = 12, and the recursion relation gives 


a 3 


2 — a 


2-12 


-a 1 


c 3 = -^c 3 


P 3 (u) = C 3 u — ^C 3 u 


6 6 

because P 3 (u) has no constant or u 2 term. For P 3 (l) to be equal to 1, we must 
have C 3 — — §, so that P 3 (u) = ^(5u 3 — 3u). Finally, we calculate P4 for which 
a = 4 ■ 5 = 20, and the recursion relations give 

a 2 = —f a 0 = -IOC 4 ’ 6_20 


and 


<24 : 


12 


-CL2 = 


and Pi(u) = C4 — IOC4M 2 — ^yC4ii 4 . The condition Pa( 1 ) = 1 gives C4 = 3/8. 
Therefore, 

Pa{u) = i (35m 4 - 30-u 2 + 3). 

Other Legendre polynomials can be obtained similarly. However, as we shall see 
shortly, there is a much easier way of calculating Legendre polynomials. ■ 

With a determined to be of the form k(k + 1), we can now calculate all 
coefficients of the Legendre polynomials. We start by rewriting the recursion 
relation (26.10) as 

(n — 2 )(n — 1) — k(k + 1) 

^n — 7 7\ tin —2 

n(n — lj 

(k — n + 2 )(k + n — 1) 
n(n — 1) 

Iterating this once, we obtain 


CLn—2: — 2,3,..., k. 


(26.17) 


( ., 2 — n + 2)(fc + n — 1) (fc — n + 4)(fc + n — 3) 

a n — ( — 1 ) 7 7 ^ 7 7777 777 a n -4 


-(- 1 ) 


n(n — 1) 

2 [{k — n + 2 ){k — n - 


(n — 2)(n — 3) 

■ 4)] [(k + n - 1) (fc + n - 3)] 


n{n — l)(n — 2)(n — 3) 

By iterating a few times, the reader may check that 

a n = (—l) m [(/c — n + 2)(fc — n + 4) • • • (fc — n + 2m)] 


-d n —4 • 


[(fc + n — l)(fc + n — 3) • • • (fc + n — 2m + 1)] 
n(n — 1) • • • (n — 2m + 1) 


~0> n —2m • 


(26.18) 


To proceed, we need to take the two cases of even and odd n separately. 
We treat the even case and leave the odd case as an exercise for the reader. 
Let us assume that n = 2m, then k must also be even 9 and the last equation 
above yields 

, Pi)''' (2 j - 2m + 2)][(2 j + 2m - 1) • • • (2 j + 1)] 

a2m = (_1) -2m(2?n — 1) • • • 1-° 0 

, [(2j)!!/(2j - 2m)\\][(2j + 2m - l)!!/(2 j - 1)!!] 

= (_1) - (2m)\ - a °’ 


9 Recall from our discussion above that even Legendre polynomials correspond to even 
a = 2j (2j + 1). 
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where we set k = 2j. Using the relations (see Problem 11.1) 
(21-1)!! = ®, (2/)!! = 2 z Z!, 


we finally obtain 


even Legendre 
polynomial 


odd Legendre 
polynomial 


_ (—l) m j! (2 j + 2to)! 1 

a ' 2m 2i (j + m)!(j — to)! (2m)! ° 
(2 j + 2 to) ! 1 


= M~^) 


(j + m)\(j — to)! (2m)!' 


(26.19) 


where Aj = oo(j!/2 J ). The reader may check that 


d2m+l — 


(2 j + 2to+ 2)! 


1 


(j + to + l)!(j — to)! (2to + 1)! 


(26.20) 


for some constant Bj. Therefore, the even Legendre polynomials will be given 

by 


p 2j (x) = A j Y / (-iy 


771=0 


(2 j + 2 to)! x 2m 

(j + to)!(j — in)! (2m)! 


and the odd polynomials by 


(26.21) 


p 2 j + i(x) = B j ]T(-iy 


771=0 


(2j + 2to + 2)! x 2m+1 

(j + to + l)!(j — m)! (2m + 1)! ’ 


(26.22) 


We now introduce a new summation index r = j — to in either sum and let 
n = 2j in the even sum and n = 2j + 1 in the odd sum. Then both sums can 
be written simply as 


P n (x) 


In/2] 


r=0 


(2n-2r)! a: n - 2r 
(n — r)!r! (n — 2r)! ’ 


(26.23) 


where [a]—for any real number a —denotes the largest integer less than or 
equal to a, and K n is an arbitrary constant which, by convention, is taken to 
be l/2 n so that P n ( 1) = 1. This leads to 


P n (x) 


1_ l ^ ] ir (2n~2r)! x^ 2r 

2 n 2 --^ (n~r)!r! (n — 2r)! 


(26.24) 


Referring to the definition of the hypergeometric function (11.23) and 
Equation (26.24), the reader may verify that 


P2n(x) 


(-i r 


(2n)! 

2 2 n (? r !) 2 


F(—n, n + b; b; x 2 ) 


(26.25) 
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and 


P 2n +i{x) = (-1)’ 


, (2n + 1)! 
2 2n (n\) 2 


xF{—n , n + I; I; a; 2 ) 


(26.26) 


Adrien-Marie Legendre came from a well-to-do Parisian family and received an 
excellent education in science and mathematics. His university work was advanced 
enough that his mentor used many of Legendre’s essays in a treatise on mechanics. 
A man of modest fortune until the revolution, Legendre was able to devote himself 
to study and research without recourse to an academic position. In 1782 he won the 
prize of the Berlin Academy for calculating the trajectories of cannonballs taking 
air resistance into account. This essay brought him to the attention of Lagrange 
and helped pave the way to acceptance in French scientific circles, notably the 
Academy of Sciences, to which Legendre submitted numerous papers. In July 1784 
he submitted a paper on planetary orbits that contained the now-famous Legendre 
polynomials, mentioning that Lagrange had been able to “present a more complete 
theory” in a recent paper by using Legendre’s results. In the years that followed, 
Legendre concentrated his efforts on number theory, celestial mechanics, and the 
theory of elliptic functions. In addition, he was a prolific calculator, producing 
large tables of the values of special functions, and he also authored an elementary 
textbook that remained in use for many decades. In 1824 Legendre refused to vote 
for the government’s candidate for the Institut National. Because of this, his pension 
was stopped and he died in poverty and in pain at the age of 80 after several years 
of failing health. 

Legendre produced a large number of useful ideas but did not always develop 
them in the most rigorous manner, claiming to hold the priority for an idea if 
he had presented merely a reasonable argument for it. Gauss, with whom he had 
several quarrels over priority, considered rigorous proof the standard of ownership. 
To Legendre’s credit, however, he was an enthusiastic supporter of his young rivals 
Abel and Jacobi and gave their work considerable attention in his writings. 

Legendre also contributed to practical efforts in science and mathematics. He 
and two of his contemporaries were assigned in 1787 to a panel conducting geodetic 
work in cooperation with the observatories at Paris and Greenwich. Four years 
later the same panel members were appointed as the Academy’s commissioners to 
undertake the measurements and calculations necessary to determine the length of 
the standard meter. Legendre’s seemingly tireless skill at calculating produced large 
tables of the values of trigonometric and elliptic functions, logarithms, and solutions 
to various special equations. 


26.3 Second Solution of the Legendre DE 


Recall that any second order linear DE has two bases of solutions. We have 
so far found one solution of Legendre DE in the form of the Legendre poly¬ 
nomials. Once we have these solutions, we can obtain a second solution using 
Equation (24.6). To conform with Equation (24.6), we need to reexpress the 
Legendre DE as 


d 2 y 

dx 2 


2x dy 
1 — x 2 dx 


n(n + 1) 
1 — x 2 


V = 0 . 



Adrien-Marie 

Legendre 

1752-1833 
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Legendre 
functions of the 
second kind 


This is an homogeneous second order linear DE with 


p{x) 


2x 

1 — x 2 


and 


q{x) 


n(n + 1) 
1 — x 2 


Using P n (x) as our input, we can generate another set of solutions. Let Q n (x ) 
stand for the linearly independent “partner” of P n (x). Then, setting C = 0 
in Equation (24.6) yields 10 


Q„(x) = KP n (x) f 

J a 


P%{s) 


exp 


2 1 


But 


so that 


2 1 


1-t 2 


dt = — In |1 — t 2 1 = — In 


lc 1 -t 2 

l-S 2 


dt 


1-c 2 


= In 


ds. 


1 — c 2 


1 — s 2 


exp 


2 1 


1 -t 2 


dt 


ds — exp 


In 


1 — c 2 


1 — s 2 


1 — c 2 


1 — s 2 


|l~c 2 | 

1-S 2 ’ 


because s, being the argument of a Legendre polynomial, is the cosine of an 
angle and therefore cannot exceed 1 so that 1 — s 2 > 0. It now follows that 

f x ds 

Qn(x) = A n P n (x) / jr 2 \p 2 ( V* (26.27) 

Ja (1 -S 2 )P 2 {S) 

where A n = K 11 — c 2 1 is an arbitrary constant determined by convention, and 
a is an arbitrary point in the interval [—1, +1]. The subscript for A n indicates 
that the constant may be different for different n. These new solutions are 
called Legendre functions of the second kind. Note that, contrary to 
P n (x), Q n (x ) is not well behaved at x = ±1 due to the presence of 1 — s 2 
in the denominator of the integrand of Equation (26.27). For this reason, we 
shall not use these second solutions in this book. 


Example 26.3.1. Example 26.2.2 gives Po(x) = 1. Therefore, 


Qo(x) = A 0 I 2 

J a 1 S 



= ^4o 


- In 

1 + * 

- iln 

1 + a 

2 

1 — X 

2 

1 — a 


ds 


The standard form of Qo(x) is obtained by setting Ao = 1 and a = 0: 


Qo(x) = - In 


1 + x 

1 — X 


for |*| < 1. 


Similarly, since Pi{x) = x, we obtain 

+ C for |*| < 1, 


10 Since we are interested in a different second solution, we can ignore any constant 
multiple of the first solution that is added to the sought-after second solution. 


Q i(*) = A\X 


F 


ds 


(1 - s 2 )s 2 


= Ax + Bx In 
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where A, B, and C are constants, and to perform the integration, we used 

11 1 1 

(l-s 2 )s 2 “ I 2 + 2(1 -s) + 2(1+ s)’ 

which renders the integral elementary. In the case of Qi{x), convention demands 
that A = 0, B = 1, and C = — 1. Thus, 


Qi(x) = —x In 


1 + x 

1 — x 


- 1 . 


26.4 Complete Solution 

Having found the angular solution of Laplace’s equation, we now tackle the 
radial part. With a = k(k + 1), we can write the first equation in (26.2) as 

r 2 + 2 r—— — k(k + 1)R = 0. (26.28) 

dr dr 

Since p 2 (0) = 0, we have to consider a solution of the form R(r) = r s Y^= o b n r n - 
Differentiating this series and substituting it in Equation (26.28) gives 

OO 

y^[(n + s)(n + s + 1) — k(k + 1 )]b n r n+s = 0 

n—0 

or 

[(n + s)(n + s + 1) — k(k + 1 )]b n = 0 for n = 0,1,2,.... 

In particular, for n = 0, and assuming that bg ^ 0, we obtain the indicial 
equation 

s(s + 1) — k(k + 1) = 0 => s = k or s = —k — 1. 

For s = k, the equation for general nonzero n gives 

[(n + k)(n + k + 1) — k(k + 1 )]b n = 0 => n(n + 2 k + 1 )b n = 0. 

Since neither n nor n + 2k + 1 is zero, we have to conclude that b n = 0 for all 
n > 1. Thus, for s = k, we obtain the solution R(r) = A)~r k where Ak is an 
arbitrary constant (we called it bo before). 

For s = —k — 1, we have 

[(n — k — l)(n — k ) — k(k + 1 )]b n = 0 n(n — 2k — 1 )b n = 0 
for which we can have either n = 2k + 1 or b n = 0. If n = 2k + 1, then 
R(r) = r- k ~ 1 b 2k+1 r 2k+1 = b 2k+1 r k , 


an example of the 
indicial equation 


which is (a constant times) what we already have. So assume that n/ 2k + 1. 
Then b n = 0 for all n > 1, and we obtain the solution R(r) = B k r ~ k ~ 1 where 




620 


Laplace’s Equation: Spherical Coordinates 


the most general 
solution of the 
spherical radial DE 


From a known 
solution of 
Laplace’s 
equation, we find 
a formula that 
generates all 
Legendre 
polynomials. 


Bj~ is another arbitrary constant. It follows that the most general solution of 
the radial DE is 

R k (r) = A k r k + ^ T , k = 0,1,2,.... 

We can now put the radial and the angular parts together: 

Theorem 26.4.1. To find, the most general azimuthally symmetric solution 
of Laplace’s equation in spherical coordinates, we midtiply the radial solution 
and the angular solution (Legendre polynomial) for each k and sum over all 
possible values of k: 

OO 

*M) = E 

k -0 

where we have substituted cos 6 for u. 

Equation (26.29) gives the general solution of Laplace’s equation, and we 
shall consider examples of how to use it to solve some representative problems, 
but first we will go backward: From a particular known solution of Laplace’s 
equation, we want to find an important property of Legendre polynomials. 
Equation (15.18) shows that 1/|r — ro| is a solution of Laplace’s equation at 
all points of space except ro- In general, |r — ro| is not azimuthally symmetric. 
However, if we place ro along the 2 -axis, the yj-dependence will disappear. In 
fact, with ro = ae z , we have 


A k r k 


JpT ) p k(™se), 


(26.29) 


|r — r 0 | = |r — ae z | = \J r 2 + a 2 — 2 ar cos 9. 

According to (26.29) the solution 1/ |r — ae-| can be written as a series: 

- = = V (Aj-r k + ^ Pfc (cos 9). 

\Jr 2 + a 2 — 2ar cos 9 r^ 1 ) ’ 

We are interested in the region of space inside the sphere of radius a. Since 
the origin is included in this region, no negative powers of r are allowed. 
Therefore, all coefficients of such powers must be zero, i.e., Bj. = 0. To 
determine the other set of coefficients, evaluate both sides at 9 = 0 and use 
Pfc(l) = 1. This gives 


1 1 
Vr 2 + a 2 — 2 ar \r — a| 


1 


a — r 


OO 


y>r‘. 


Using the result of Example 9.3.3 and the fact that r/a < 1, the LHS can be 
expanded in powers of r/a: 


1 1 
a — r a(l — r/a) 


OO , OO 


k=0 


k—0 
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Comparison of the last two equations gives A}. = l/a fc+1 . It follows that 


1 

vV 2 + a 2 — 2 ar cos 9 


-f 

a y— ' Vo/ 


fc =0 


r < a. 


Introducing t = r / a and u 
tant relation 

g(t,u) e 


cos 6 on both sides, we finally obtain the impor- 


1 

\/l + t 2 — 2 tu 


k =0 


(26.30) 


The RHS can be considered as a Taylor (or Maclaurin) series in t for the 
function on the LHS. 


Theorem 26.4.2. The kth coefficient of the Maclaurin series expansion of 
g(t,u) = 1/ + t 2 — 2 tu about t = 0 is Pk(u). Specifically, 


Pk(u) 


1 d k 1 

k\ dt k Vl + t' 2 - 2 tu t= o 


(26.31) 


The function g{t, u) is called the generating function of the Legendre poly¬ 
nomials. 


Example 26.4.3. As an immediate application of the generating function to 
potential theory, consider the electrostatic or gravitational potential which can be 
written as 


$(r) = I< 


dQ( r') 


(26.32) 


where K is k e for electrostatics and — G for gravity, and Q represents either electric 
charge or mass. Assuming that r r', we can expand in powers of the ratio r'/r 
which we denote by t. The key to this expansion is the following power series of 
l/|r-r'|: 


1 


l r 


1 _ 1 1 
Vr 2 + r' 2 — 2r ■ r' r y' 1 + t 2 — 2t cos 7 


i^f fc P fc (c°S7), 

' k=0 


where 7 is the angle between r and r' and we used Equation (26.30). Substituting 
this expansion for l/|r — r'| in (26.32), we obtain 


$(r) = K // Y, ; 

k=0 


.k + 1 


Pk( cosy) dQ(r') = K ^ 


Qk 

rpkffi 1 ; 


fc =0 


(26.33) 


Legendre 
polynomial and 
multipole 
expansion 


where we replaced t with r’/r and introduced Qk, the so-called fc-th moment of 
source (charge or mass), by 11 


Qk = J r' k P k ( cosy) dQ(r'). 


(26.34) 


11 Do not confuse this Qk with the second solution of Legendre DE introduced in Equation 
(26.27). 
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Recall that cos 7 depends on 9 and 1 p. Thus, once the integral over Q. is done, the 
result will depend on 9 and <p as it should, because <I>(r) is, in general, dependent 
on these angles. 

The moments Qk are supposed to describe the intrinsic properties of charge (or 
mass) distributions and should not depend on the observation point —described, in 
part, by 9 and ip. This is the reason that Cartesian coordinates are more useful—at 
this level of presentation- -than spherical coordinates. In Cartesian coordinates, we 
can separate the primed from the unprimed coordinates (as we did in the definition 
of dipole in Chapter 10 and of quadrupole in Chapter 17), and define multipole 
moments entirely in terms of the density function of the distribution of the source. 
This does not mean, however, that a complete separation is impossible in spherical 
coordinates. In fact, there are techniques of performing such a separation—in terms 
of the so-called “spherical harmonics”—but they are much more complicated and 
beyond the scope of this book . 12 B 


26.5 Properties of Legendre Polynomials 

From the Legendre DE, the generating function, and other formulas derived 
earlier, one can obtain a variety of relations connecting Legendre polynomials. 

26.5.1 Parity 

The easiest property to obtain is parity which is the content of the following 
formula: 

Pk(~u) = (-1 ) k P k {u). (26.35) 

This is a direct consequence of the fact that P k ( u ) has only even powers of u 
if k is even, and odd powers if k is odd. 


26.5.2 Recurrence Relation 


Differentiate both sides of Equation (26.30) with respect to t to obtain 


u — t 

(1 + t 2 — 2 tu) 3 / 2 


Y.kt^Pkiu). 

1 


Rewrite the LHS as 

u — t 1 

1 + t 2 - 2 tu a/1 + t 2 - 2 tu 


u — t 

1 + t 2 — 2 tu 


fc=0 


(26.36) 


(26.37) 


where we used (26.30) for the term with the square root. Equating the RHS 
of (26.37) with the RHS of (26.36) and multiplying the result by 1 + t 2 — 2 tu 
yields 

OO OO 

(t — u ) ^ t k P k {u) + (1 + t 2 — 2 tu) ^ kt k ~ l Pk{u) = 0 
fc=0 fc=i 

12 See Hassani, S. Mathematical Physics: A Modern Introduction to Its Foundations , 
Springer-Verlag, 1999, Chapter 12 for a discussion of spherical harmonics. 
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or 


OO OO OO 

Y J t k+1 Pk(u )~ u y t k Pk(u) + Y k t k ~ lp k(u) 

k =0 k =0 k =1 

OO OO 

+ Y kfk+1Pk ^ - 2uY ktk p k(u) = 0 . 

k -1 k -1 


All the coefficients of powers of t must vanish. To find these coefficients, 
change the dummy index in each sum so that all sums will have the same 
power of t. So, let k = n — 1 in the first and fourth sums, k = n in the second 
and the last sums, and k = n + 1 in the third sum. Then the above equation 
can be written as 

OO 

Y[ P n-l{u) - uP n {u ) + (n + l)P n+ i(«) 

n 

+(n - l)P n _i(u) - 2 unP n (u)]t n = 0, 

where we have purposefully left out the lower limit of summation because 
different sums start at different initial values of n. Since a power series is zero 
only if all its coefficients are zero, we set the coefficients of the series above 
equal to zero to obtain 

(2n + l)uP n (u) = (n + l)P n+ i(u) + nP n -i(u), n= 1,2,3,- (26.38) 

Using Pq(u) = 1 and Pi(u) = u, one can generate all Legendre polynomials 
from Equation (26.38). 

Example 26.5.1. For n = 1, Equation (26.38) gives 

3uPi(u) = 2P2(u) + Po{u) =£• 3 m 2 = 2P2(u) + 1 =$■ P 2 (u) = ^(3 u“ — 1 ) 

For n = 2, Equation (26.38) gives 

5uP2(u) = 3 P 3 (it) + 2Pi(u) => |u(3m 2 — 1) = 3Pa(u) + 2u => Ps(u) = i(5w 3 — 3 m), 
and so on. ■ 


The recurrence relation can be used to obtain P n { 0) which is a useful 
quantity. We quote the result and leave the details as an exercise for the 
reader. For odd n, we have P n (0) = 0. The result for the even case is 


P 2 „(0) 


(- 1 )" 


(2n-l)!! 

(2n)!! 


(—l) n 


(2n)! 
2 2ra (n!) 2 ' 


(26.39) 


Example 26.5.2. We can also obtain P n (0) by letting u = 0 in Equation (26.30): 


(l + t 2 )“ 1/2 =Yt k Pk{0)- 

k =0 


recurrence relation 
for Legendre 
polynomial 
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The binomial expansion of the LHS gives [see Equation (10.15)] 

(1 +1^-1/2 = 1 + J- (-IK-I-!)••• (-5- "+!) (fy 


= i + E(- 1 ) 

n= 1 
oo 

= i + E(- 1 ) 


+ !)••• (n - |)^ 2 „ 


1 • 3 • ■ ■ (2n — 1) 
2 n n\ 


Comparing this with the RHS of the first equation, we see that P n (0) = 0 when n 
is odd and that 

Po mi = r n » l-3---(2n~l) ir (2n-l)!! 

^ ^ ' 2™n! ^ ’ 2 "n! 

which is the same as (26.39) because 2”n! = (2n)!! by Problem 11.1. _ 


26.5.3 Orthogonality 

The most useful property of the Legendre polynomials is their orthogonality. 
We have already seen in Chapters 6 and 7 how dot products can be defined 
for polynomials. We now show that Legendre polynomials of different orders 
are necessarily orthogonal once the dot product is defined in terms of suitable 
integrals (also see Example 24.5.3). Write the Legendre DE for P n and P m as 


— [(1 - u 2 )P' n {u)} + n(n + 1 )P» = 0, 

E[(1 ^ u 2 )p' m {u)\ +m(m+ 1 )P m (u) = 0, 

where the prime indicates derivative. Multiply both sides of the first equation 
by Pm(u) and the second equation by P n (u) and integrate from —1 to +1: 

/ I 7 I* 1 

— {{l-u 2 )P' n (u)]Pm{u)du + n(n+l) J P n (u)P m {u) du = 0, 

J ^^~ u ‘ 2> ) P rn( u )\Pri(u}du + m(m+l)J P m (u)P n (u) du = 0. (26.40) 


Use integration by parts to write the first integral as 



- u 2 )P' n {u)}P m {u) du 


(l-u 2 )P'(u)P m (u)E 

S -V-' 

=0 because of (1— u 2 ) 

~ J J(1 - U 2 )Pn(u)]P^(u) du. 
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The first integral of the second line of Equation (26.40) gives exactly the same 
result. Therefore, if we subtract the two equations of (26.40), we obtain 

[n(n + 1) — m(m + 1)] J P n (u)Pm(u) du — 0. 


It now follows that 

Theorem 26.5.3. If m n, then P n (u)P m (u) du = 0, i.e., if the inner 
product is defined as an integral from —1 to +1, then Legendre polynomials of 
different orders are orthogonal. 

We put this orthogonality relation to immediate use. Square both sides 
of Equation (26.30) keeping in mind to introduce a new dummy index when 
multiplying the sums , and integrate the result from —1 to +1: 

L T+wffui =/_' (g«‘w) (£'"-<■>) <*“• 

On the RHS, we switch the order of summation and integration: 

OO OO 

RHS = EE t m+k P k {u)P m {u) du. 

k —0 m —0 v — ^ > 

V 

=0 unless m—k 

As we perform the inner sum, by Theorem 26.5.3, all terms will vanish except 
one, i.e., only when m = k. So, the double sum reduces to a single sum 


RHS = t2k 

k =0 


Pk( u ) du. 


/-1 


The integral on the LHS of (26.41) can be done by substituting y = l+t 2 — 2tu 
and dy = —2 tdu: 

1 A 1-4 ) 2 dv 1 

LHS=~- -* = — [ln(l + t) - ln(l - t)}. 

J(i+t.) 2 y t 

The two natural log terms can be expanded using Equation (10.23). The 
reader may check that 


i 00 4.2k 

t HI + t) - ln(l - *)] = 2^; (26.42) 

k—0 Z 


The fact that only even powers of t are present could have been anticipated 
because the function on the LHS of Equation (26.42) is even in t. Equating 
the RHS and the LHS of (26.41), we obtain 


^E 


k—0 


t 2k 

2k + 1 


OO r i 

E i2fc / p k(u)du. 

b —n J —1 
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polynomials 


For these two power series in t to be equal, their coefficients must equal: 


J Pk(u)du = 


2k+1 


Combining this with the orthogonality relation of Theorem 26.5.3 and using 
the Kronecker delta introduced in Equation (7.9), we have 
r 1 o 


I -P m (t/.).P n (li) dll — ^rrm* 

277/ H"~ 1 


(26.43) 


26.5.4 Rodrigues Formula 


Rodrigues formula 


We started our discussion of Legendre polynomials by representing them as 
infinite series and then truncating the series due to physical restrictions. We 
noted that the recursion relation obtained by the Frobenius method gave all 
the coefficients of the polynomials in terms of ao and a\. Later, we found 
a “closed” expression for all Legendre polynomials in terms of derivatives of 
the generating functions which is a very useful function as the derivation of 
(26.43) demonstrated. 

There is another “closed” expression of Legendre polynomials which we 
shall discuss now. This expression is called the Rodrigues formula and is 
given by 13 

1 d n 

P n {x) = — -XX— U x 2 - 1)«1 . (26.44) 

y ’ 2 n n\ dx n LV ; J v ’ 

To see that the RHS indeed gives the nth Legendre polynomial, we show 


that it satisfies the corresponding Legendre DE. The most elegant way to 
show this is to resort to complex analysis where derivatives are represented 
as integrals [see Equation (19.10)]. Thus, for f(z) = ( z 2 — 1)", the Cauchy 
integral formula gives 


(* 2 ~1) 



(C 2 ~ 1)" 
iZ~z) 


dZ, 


and Equations (19.10) and (26.44) yield 
1 (F 


P n {z) = 


2 n n\ dz n 


[(* 2 -l)"] = 


_ (£ 2 - 1)" 

2"(27ri) To (Z - z) n+1 


dC 


To find P'Jz) and P"(z), we differentiate the integral, carrying the derivative 
inside and letting it differentiate the denominator: 


dP 

1 I d 

dz 

2 n (2ixi) Jc dz 

d 2 P 

d UP\ 

dz 2 

dz \dz ) 2 


(ti + 1) (n + 2) 


2 n (2iri) 


(£ 2 ^ l) r 


(Z - z) n+1 
n + 1 


dZ = 


n + 1 r (e 2 -l) r 



\ te - i) n l 

fc dz 

XZ - z) n+2 _ 


(27ri) 

(g 2 ~ 1)" 

n+3 


2 n (2m) J c ($ - z) 


dZ 


n+2 




c 


(Z-z) 


dZ. 


13 The fact that we are using x , rather than u, as the argument of the Legendre polynomial 
should not cause any confusion. 
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Substituting these expressions in the DE, the reader may check that 

o .d 2 P „ dP . 

(1 - z 2 )—^- - 2z— —I- n(n + 1 )P 
dz dz 

n + 1 / (£ 2 - l)”[n£ 2 - 2(n + l)^z + n + 2] 

(£ - z)" +3 ^ 

The reader may also verify that 

(£ 2 — l) n [?r£ 2 — 2(n + l)£,z + n + 2] d r(£ 2 - l) n+1 ' 

(£ - z) n + 3 “ d£, _ (£ - z) n + 2 _ ’ 

so that the integrand is the derivative of a function. Since the contour of 
integration is closed, the lower and upper limits of integration coincide and 
the integral vanishes. So, the Rodrigues formula indeed yields Legendre poly¬ 
nomials. 


Example 26.5.4. As an illustration of the use of the Rodrigues formula, let us 
evaluate the integral 


I = 



X k P n {x) dx 


for k < n. 


The procedure is to replace P n (x) by the RHS of Equation (26.44) and integrate by 
parts repeatedly. After one integration by parts, we get 



The first term on the RHS of the second line is zero because each differentiation 
reduces the power of (x 2 — l) n by at most one unit. So after n — 1 differentiations, 
we get a sum of terms each having (x 2 — 1) raised to various powers, with the lowest 
power being one. All these terms vanish at * = 1 as well as at x = — 1. Continuing 
the integration by parts, we get 


1 = 


-kx fc - 1 d n ~ 2 


2 n n\ dx 7 


~ [(* 2 - D 1 


+ 


(- 1 ) 
2 n n 


i 2 r 1 

- J ^ k{k-l)a 


dx' 


m [(*' - 1)1 dx. 


=0 for same reason as above 


After k integrations by part, we obtain 


I = 


(- 1 ) 


2 n n 


k rl 

r «/-. 


dx i 


T=k - 1)1 dx 


because after k differentiations x k yields k\. Now, if k < n, the integral vanishes for 
the same reason as above. If n = k, no differentiation will be left, and we have 


/ = 


(- 1 ) 


2 n n 


n p 1 

r 


(x 2 — l) n dx. 
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Problem 26.14 shows how to evaluate the final integral and obtain 


J 1 (x 2 - l) n dx= (-l) n 2 : 


,2n+l (»!)" 

(2n + l)!' 


Therefore, 


l ; (2n + l)! 


We summarize the above derivation 


J ^x k P n (x)dx= | 2 n+1 (n\) 2 


if k < n, 


(2n + 1)! 


if k = n. 


(26.45) 


If instead of x k we have a general polynomial of order k in x with n > k, the 
integral will still vanish. ■ 

The result of the preceding example is summarized as 


Box 26.5.1. Any polynomial of degree less than n is orthogonal to P n . 


26.6 Expansions in Legendre Polynomials 

The orthogonality of Legendre polynomials—as the orthogonality of the Fourier 
trigonometric functions—makes them very useful for expansion of functions 
defined in the interval (—1,+1). Let f(x) be such a function. Then we write 

OO 

f(x) = Y,c n P n {x) (26.46) 

n=0 

and seek to find c n . But c n can be obtained by multiplying both sides of 
the series by P m (x) and integrating from —1 to +1. On the LHS, we get 
f2 f{x)P m (x) dx, and on the RHS 


/■ 1 / OO \ OO „1 

/ ^2 CnPn{x) Pm(x ) dx = ^ C n / P n (x)P m (x) dx = C n 

• 7 ~ 1 \n—0 J n= 0 {-I _, 


2m +1 ’ 


[2/(2n+l)\6 mn by (26.43) 


Equating the RHS and the LHS, we obtain 
c m = 2W 2 + 1 J f(x)P m (x)dx or c n = 2n + 1 J f(x)P n (x). (26.47) 


Equations (26.46) and (26.47) give a procedure for expanding an arbitrary 
function defined in the interval (—1,+1) in terms of Legendre polynomials. 
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If f{x) happens to be a polynomial of degree k, then it can be written 
as a finite sum of Legendre polynomials of degree k and less. In fact, for 
f(x) = x k , we have 


2 n + 1 


J x k P n (x)dx = 0 for n > k 


by Box 26.5.1. Thus the coefficients in the sum (26.46) beyond k are all zero. 

Example 26.6.1. We want to find the Legendre expansion of a function f(x) 
defined as 

(Vo if 0 < x < 1, 
fix) = l 

[ —Vo if — 1 < x < 0 . 

To find the coefficients of expansion, we use Equation (26.47): 

2n + 1 


2 

2 n + 

2~ 

2n + 1 


f(x)P n (x) dx 


■L 

- J fix) Pn{x) dx + + 1 J f{x) P n (x)dx 

= -Vo =+V 0 

J P n ix)dx + J Pnix)dx 


(26.48) 


In the first integral of the last line, we make the substitution x = — y so that 
r o 


/ 0 rO 

P„(x)dx = J 


f 


P„(x)dx= / P n (—y) (—dy) = / P„{-y) dy = (- l) n / P n (x)dx, 

!+i 


f 


where we used (26.35) and, in the last equality, we changed the dummy variable of 
integration from y to x (Section 3.2). Inserting this in (26.48), we obtain 


c„ = ‘^f±l Vo [l - (- 1 )"] / P n (x)dx, 


2 

2n + 1 


i/‘ 


Vo 


0 if n is even 

k 2 fg P 2 k +1 ix) dx if n = 2k + 1 


where we have written the odd n as 2k + 1 for k = 0 , 1 , .... 

It remains to evaluate the integral of a Legendre polynomial of odd order in the 
interval (0,1). To this end, we use the Rodrigues formula: 

l P2k+l{x)dx= 2^(lk + iy. J 0 dx 


d 2 


2 2k + 1 (2k + 1)! dx 2k 


d- 


2 2k + 1 {2k + l)\ { dx 2k 


[(x 2 - l) 2fc+1 ] 

” [(x 2 - l) 2k+1 ] 


dx 2k 


(x 2 - l) 2k+1 
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Legendre 
expansion of the 
Dirac delta 
function 


The first term gives zero because there is no sufficient number of differentiations to 
get rid of all factors of ( x 2 — 1). For the second term, we note that ( x 2 — l ) 2fc+1 is 
a polynomial in x whose derivatives of various orders consist of powers of x. These 
powers will give zero at x = 0 except for the constant term (of zeroth power). So, 
let us use binomial expansion for ( x 2 — l ) 2fc+1 which is equal to —(1 — x 2 ) 2k+1 : 


d 2 


dx 2k 


(x 2 - l) 2k+l 


d 


2k 


dx 2k 

2k+l 


2k+l 


E -i 


( 2 fc + l)! 


L. jK 2k + l-j)\ { x2) [ 


= -E- 


(2k + 1 )! 


3=0 


j\(2k + 1 — j)\ 


ITT (-I)’ 


d 2 


dx 2k 


(*") 


whose constant term is obtained when k = j, all the other terms of the sum will 
vanish either because of too many differentiations (when j < k, we end up differen¬ 
tiating constants) or too few differentiations (when j > k, a power of x will remain 
which evaluates to zero at x = 0). Therefore, 


d 2 


dx 2k 


\x 2 ~ 1 ) 


2fc+l 


(2k + 1 )! 
k\(k + 1 )! 


(- 1 )* 


d 2 


dx 2k 


(*“) 


( 2 fc+l)! ,Nfc+i 


and 

J 0 P2k+1 ^ dx = ~ 2 2fc+i(2 k + 1)! 


k\(k + 1 )! 

(2k + 1 )! 
k\(k+ 1 )! 


(—1) + (2fc)! 


(—l) fc+1 (2fc)! 


(—l) fc (2fc)! 
2 2fc+1 fc!(fc + 1)!' 

(26.49) 


Finally, we can write the coefficient C 2 fc+i as 
2(2k + l) + l 


C2k+1 = 2- 


-Vo 


/' 


n , ^ (—l) fc (4fc + 3)(2fc)! 

P 2k +i(x)dx = r-rrr Vo 


2 “ J 0 2 2k+1 k\(k + 1 )! 

with Cn = 0 for even n. The final expansion series can now be given: 

f Vo if 0<X<1 °° I i\k 


fix) = 


(-l) fc (4fc + 3)(2fc)! 

, 0 2 A: 4 - 1 7 . 1 / 7 . i 1)1 ^2k+l\%) 


2 2k + 1 k\(k + 1 )! 

= Vo [lPi(x)-lP 3 (x) + ±P 5 (x)-■■■}. ■ 

Example 26.6.2. We can easily obtain the Legendre expansion of the Dirac delta 
function. The expansion coefficients are given by 


2 n + 1 


j: 


f(x)P n (x) dx = 


2 n + 1 


J S(x)P n (x ) dx = ^J-P„( 0 ). 


From Equation (26.39) and the discussion preceding it we can find all values of 
P n ( 0). Substituting these values in the above equation, we conclude that c„ = 0 if 
n is odd, and 


C2k 


Ak+l 


(-ir 


(2fc)! 


2 2fc (fe!) 


zt ^ t 1 \fc(4fe + l)(2 k)\ ! ^ 

S(x) = 2^(-l) 22fe+1(fc!)2 P*k(x). 


k =0 


It now follows that 
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26.7 Physical Examples 


The most common physical problems involving Laplace’s equation are those 
from electrostatics in empty space, and steady-state heat transfer. In each 
case, a surface is held at some (not necessarily uniform) potential or tem¬ 
perature and the potential or temperature is sought in regions away from 
the surface. In the present context, these surfaces are typically (portions of) 
spheres. 

Example 26.7.1. Two solid heat-conducting hemispheres of radius a, separated 
by a very small insulating gap, form a sphere. The two halves of the sphere are 
in contact—on the outside—with two (infinite) heat baths at temperatures To and 
—To [Figure 26.1(a)]. We want to find the temperature distribution T(r,9,ip) inside 
the sphere. We choose a spherical coordinate system in which the origin coincides 
with the center of the sphere and the polar axis is perpendicular to the equatorial 
plane. The hemisphere with temperature To is assumed to constitute the northern 
hemisphere. 

Since the problem has azimuthal symmetry, T is independent of y>, and we can 
immediately write the general solution from Equation (26.29). However, since the 
origin is in the region of interest, we need to exclude all negative powers of r. This 
is accomplished by setting all the B coefficients equal to zero. Thus, we have 


T(r, 9) = ^2 A n r n P n (cos 9). 


(26.50) 


71=0 

It remains to calculate the constants A n . This is done by noting that 

7T 


T(a,9) = 


T 0 if 0 <9 < 


-To if - < 9 < 7T. 


In terms of u = cos 9, this is written as 

( —To if — 1 < u < 0, 
T(a,u) = < 

{T 0 if 0 < u < 1. 

Substituting this in Equation (26.50), we obtain 
( -T 0 if - 1 < u < 0 


T(a,9) = 


,Tq if 0 < u < 1 


= ^2 Ana" Pn{u), 


(26.51) 


which—except for using u instead of x —is entirely equivalent to the expansion of 
Example 26.6.1, where we found that even coefficients are absent and 

„ = 4 2fc+i _ (—l) fc (4fc + 3)(2fc)! 

C2k+i-A 2 k+ia 2 2k+1 k\(k + 1)! T °' 

Finding A 2 k+i from this equation and inserting the result in (26.50) yields 

(—l) fe (4fc + 3)(2fc)! /r\2fc+i 


TM)=T 0 ^. 


2 2k+1 k\(k + 1) 


h I / r \2fc+l 

pQ P 2 fc+i(cos 9), 


(26.52) 


two solid 
heat-conducting 
hemispheres held 
at temperatures 

To and —T 0 


where we have substituted cos 9 for u. 
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(a) (b) 

Figure 26.1: (a) Two heat-conducting hemispheres held at two different temperatures, 
(b) Two electrically conducting hemispheres held at two different potentials. The upper 
hemispheres have the polar angle range 0 < 6 < 7t/2 or 0 < cos 8 < 1, and the lower 
hemispheres have the range n/2 < 9 < n or —1 < cos 6 < 0. 


two electrically 
conducting 
hemispheres held 
at potentials Vo 
and —Vo 


Example 26.7.2. Consider two electrically conducting hemispheres of radius a 
separated by a small insulating gap at the equator. The upper hemisphere is held 
at potential Vo and the lower one at —Vo as shown in Figure 26.1(b). We want to 
find the potential at points outside the resulting sphere. Since the potential must 
vanish at infinity, we expect the first term in Equation (26.29) to be absent, i.e., 
Ak ~ 0. To find Bk, substitute a for r in (26.29), and let cos 6 = u. Then, 


$(a,u) = J2 


Bk 

n k+ 1 


Pk(u), 


where 


$(a, u) = 


—Vo if — 1 < u < 0, 
+Vo if 0 < u < 1. 


The calculation of the coefficients is identical to that of Example 26.6.1. Thus, 
Cfc = 0 for even k and 

B2m +i , (4m+ 3) (2m)! 

c 2m +i = +2 = (-1) - . ,n. . Fo 


2 2 m +i(m + l)!m! 


D _ (-l) m (4m + 3)(2m)! 2m+2l ^ 
B2m+1 ~ 2 2m+1 m!(m + 1)! ° Fo ' 

Having found the coefficients, we can write the potential: 


$M) = Ho £ (^) 2m+2 ^ + i(cosfl), (26.53) 


where cos 9 has been restored. Equation (26.53) is the multipole expansion of the 
potential of the two hemispheres. It is interesting to note that the monopole term 
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(the term with a single power of r in the denominator) is absent. It follows from 
Equation (10.33) that the total charge on the two spheres must be zero. This is 
consistent with the symmetry of the problem from which we expect equal surface 
charge densities of opposite signs on the two hemispheres. _ 


Example 26.7.3. As yet another example of the solution of Laplace’s equation 
in spherical coordinates, consider a grounded neutral conducting sphere of radius 
a placed in an originally uniform electric field Eq which is assumed to be inhnite 
in extent (see Figure 26.2). We want to find the electrostatic potential everywhere 
outside the sphere. Choosing the field to be in the positive s-direction and placing 
the center of the sphere at the origin, we will have a problem that is azimuthally 
symmetric. The general solution is therefore given by Equation (26.29). The bound¬ 
aries outside the sphere consist of the sphere itself as well as infinity. The electric 
field at infinity is the original uniform field, because the field due to the charges 
induced on the sphere vanishes at infinity. The potential of this field (at infinity) 
can be deduced from 14 


E = Eqb z = — V4? 


n 

Po O ? o U . 

oz ox oy 


Thus, the potential at infinity is independent of x and y, and can be written as 


<f>(r, 6) = — Eqz = —Eorcosd = —EorPi(cos9) for r —* oo. 


As r —> oo, the B terms in Equation (26.29) will go to zero, and we must have the 
“limiting” equality 


Y,A k r k P k (u) —> -EorPi(u). 

k=0 



Figure 26.2: The electric field in the vicinity of a sphere placed in an external uniform 
field will change, but the field far away from the sphere will remain almost uniform. 

14 We could express the gradient in terms of spherical coordinates, but, as the reader will 
note, the initial manipulation is noticeably easier in the Cartesian coordinates. 


conducting sphere 
in an originally 
uniform electric 
field 
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The orthogonality of the Legendre polynomials requires the coefficients on both sides 
to be equal. This gives 

At = 0 for k = 0 and k > 2, A\ = —Eq. 


The .B’s are obtained by applying the boundary condition of the sphere itself, 
namely the fact that it is grounded. This means that <f>(a, 9) = 0, or 

0 = A 1 aP 1 (n) + £ -g k ri P k {u) = ^ + (§ - E 0 a) P 1 (u) + £ J^iM«)- 

k =0 ' ' k= 2 

Again orthogonality of the Legendre polynomials requires the coefficient of each 
polynomial to vanish. This yields 

Bq =0, B\ = Eon 3 , and Bk = 0 for k > 2. 


Inserting all these coefficients in Equation (26.29), we obtain 


4»M) = —Eq r- — Pi (cos 0). 


(26.54) 


Because of the simplicity of the expression for potential, we can evaluate some 
other physical quantities of interest. For example, the electric field at all points in 
space is 

1 0<\> 

E = -V4> = -e,.—- 

or r o6 
or 

.a 3 ' 


E r = =E 0 ( 1 + 2^- ) cos6, 

i d<f> 

Ee =-— = -Eo( 1-^r ) sin 6» 

r or 


This is the sum of the original uniform field 

Eo(cos9e r — sin 9eg) = Eoe z 


and the field due to the charges induced on the sphere 

Q? Eq 

E sp h = —g- 1 - (2 cos 9e r +sin0ee), 


which (see Example 16.2.1) is the field of an electric dipole with dipole moment 

a 3 Eq 


P 


k e 


= 47reou Eq. 


It is interesting to note that at r = a, the only nonvanishing component of 
the field is E r . This is consistent with the known fact that electrostatic fields 
are perpendicular to conducting surfaces. Furthermore, this perpendicular field is 
related to the surface charge density by E = cr/eo- Therefore, 


a = eoE r = 3eoEo cos 9, 

I r=a 

indicating an accumulation of positive charge on the “upper” (right in the figure) 
hemisphere and an identical distribution of negative charge on the “lower” (left in 
the figure) hemisphere. B 
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26.8 Problems 


26.1. Show that by writing P(u ) = 0(0)—with u = cos 9 —and using the 
chain rule, the second equation of (26.2) becomes 


1 d 
sin 9 d9 


(1 -u 2 ) 


dP_ 

du 


+ aP = 0. 


26.2. Choose a solution of the form u r a n' un for the Legendre DE, 
assume that do and ci\ are both nonzero, and show that the only solution for 
r is r = 0. 


26.3. Derive Equation (26.15). 

26.4. Derive Equations (26.18), (26.19), and (26.20). 

26.5. Derive Equations (26.21) and (26.22) and show that they can both be 
written as (26.23). 

26.6. Show by mathematical induction (or otherwise) that Equation (26.24) 
satisfies P n (l) = 1. 

26.7. Show that Legendre polynomials and the hypergeometric function are 
related via (26.25) and (26.26). 

26.8. Suppose that Q represents electric charge. Show that in (26.34) Q 0 
is the total charge and Q\ is the dot product of e r and the electric dipole 
moment. 


26.9. (a) Change t to —t and u to —u, and show that the generating function 
g(t,u) of Legendre polynomials does not change. 

(b) Now substitute — t for t and — u for u in Equation (26.30) and compare the 
resulting equation with (26.30) to derive the parity of Legendre polynomials. 

26.10. (a) Show that (l/t)[ln(l + t) — ln(l — t)] is an even function of t. 

(b) Use the Maclaurin expansion of ln(l ± t) to derive the following series: 


i 00 +2k 

?W l + t) — 1»(1 — t )] = 2 ■£ — 

fc=0 


26.11. (a) Show that P n ( 0) = 0 if n is odd. 

(b) Show that for u = 0, Equation (26.38) yields 

9,7 _ 1 

P 2 n( 0 ) =-^- P 2 n—2 ( 0 ). 

Zn 

(c) Iterate this relation and obtain 


ft.™ = (-i)»e|-ja!!p 0 (o) 


(-i)“ 


(2n-l)!! 

(2n)!! 


Now use the result of Problem 11.1 to obtain the final form of (26.39). 
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26.12. Suppose f(x) = kLo c kPk(x). Show that 


/ I oo 

[f(x)] 2 dx = 

1 m—0 


2d 


2m + 1 


26.13. Show the following two equalities: 


d 2 P 


dP 


(1 - z 2 )— T - 2 z— —I- n(n + 1) 
dz dz 


n+l f (d — l)"[n£ 2 — 2(n + l)£z + n + 2] 


2 n (2ni) Jc 
n+l 


2 n (2m) Jc d£ 


(£ - z) n + 3 


dd 


(Z-z) 


n+2 


26.14. In the integral f_ x (x 2 — 1 ) n dx, let u = (x 2 — 1)" and dv = dx and 
integrate by parts to show that 


J (x 2 — l) n dx = — 2n J x 2 (x 2 — 1) 


-1 ) n dx. 


Integrate by parts a few more times and show that 

t n(n — 1)... (n — m + 1) 


J 1 {x 2 ~l) n dx= (-2) r 


(2m — 1)!! 


J x 2m (x 2 -l) n ~ m dx. 


Set m = n and, using the result of Problem 11.1, obtain the following final 
result: 

(n!) 2 


J (x 2 — 1)" dx = (—l) n 2 2n+1 


(2n + 1)! 


26.15. Use the procedures of Example 26.5.4 and the previous problem to 
show that for m > n: 


[ x [0 

/ X m P n (x)dx= < 2 (m+n)/2 + 1 m!( m-\-n ^ 
— ^ V (m—n)!(m+n+ 1)! 


if m and n have opposite parities, 
if m and n have the same parities, 


where having the same parity means being both even or both odd. 

26.16. Show that / 0 ' P'2kX x ) dx = 0 if k > 1. Hint: Extend the interval of 
integration to (—1,1) and use the orthogonality of Legendre polynomials. 

26.17. Find the Legendre expansion for the function f(x) = \x\ in the interval 
(—1, +1). Hint: Break up the integrals into two pieces, employ the recurrence 
relation to express xP n {x) in terms of P„_ \{x) and P n +\{x), and use the 
result of Example 26.6.1. 

26.18. (a) Find the total charge on the upper and lower hemispheres and on 
the entire sphere of Example 26.7.3. 

(b) Using p = / r'dq{v'), calculate the (induced) dipole moment of the sphere. 
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26.19. Suppose that the sphere of Example 26.7.3 is held at potential Vq. 

(a) Find the potential 4>(r, 9) and the electrostatic field at all points in space. 

(b) Calculate the surface charge density on the sphere. 

(c) Find the total charge on the upper and lower hemispheres and on the 
entire sphere. 


26.20. Using the infinite series expansion, find the electrostatic potential 
both inside and outside a conducting sphere of radius a held at the constant 
potential Vq. 

26.21. Find the electrostatic potential inside a sphere of radius a with an 
insulating small gap at the equator if the bottom hemisphere is grounded and 
the top hemisphere is maintained at a constant potential Vq. 

26.22. A sphere of radius a is maintained at a temperature To. The sphere is 
inside a large heat-conducting mass. Find the expressions for the steady-state 
temperature distribution both inside and outside the sphere. 

26.23. A ring of total charge q and radius a in the xy-plane with its center 
at the origin constitutes an azimuthally symmetric charge distribution whose 
potential is also azimuthally symmetric. 

(a) Write the most general potential function valid for r > a. 

(b) By direct integration show that 

dq(r') _ q 1 
l r -r'| 0= o 47re 0 Vr 2 + a 2 ' 

(c) Expand this expression in powers of ( a/r ) and compare the result with 
the series in (a) to find the coefficients of Legendre expansion and show that 


<!>(r, 9 = 0) = 


47re n 


< l>(r, 9) 


q ^ (—l) fc (2fc)! 
47reor 2 2k {k\) 2 


/ n \ 2/c 

P 2 fc(cos0). 


(d) Find a similar expression for $(r, 9) for r < a. 

26.24. A conducting sphere of radius a is inside another conducting sphere 
of radius b. The inner sphere is held at potential V \; the outer sphere at 
V' 2 - Find the potential inside the inner sphere, between the two spheres, and 
outside the outer sphere. 


26.25. A conducting sphere of radius a is inside another conducting sphere 
of radius b which is composed of two hemispheres with an infinitesimal gap 
between them. The inner sphere is held at potential Vi. The upper half of the 
outer sphere is at potential +V 2 and its lower half at — V 2 . Find the potential 
inside the inner sphere, between the two spheres, and outside the outer sphere. 


26.26. A heat conducting sphere of radius a is composed of two hemispheres 
with an infinitesimal gap between them. The upper and lower halves of the 
sphere are in contact with heat baths of temperatures +T\ and — Tf, respec¬ 
tively. The sphere is inside a second heat conducting sphere of radius b held 
at temperature T 2 . Find the temperature inside the inner sphere, between 
the two spheres, and outside the outer sphere. 




Chapter 27 

Laplace’s Equation: 
Cylindrical Coordinates 


Before working specific examples of cylindrical geometry, let us consider a 
question that has more general implications. We saw in Chapter 22 that 
separation of variables led to ODEs in which certain constants appeared, 
and that different choices of signs for these constants can lead to a different 
functional form of the general solution. For example, an equation such as 
cPx/dt . 2 — kx = 0 can have exponential solutions if k > 0 and trigonometric 
solutions if k < 0. One cannot a priori assign a specific sign to k. Thus, the 
general form of the solution is indeterminate. However, once the boundary 
conditions are imposed, the unique solutions will emerge regardless of the 
initial functional form of the solutions. The following argument illustrates this 
point on the angular DE resulting from the separation of Laplace’s equation 
in cylindrical coordinates. 


27.1 The ODEs 

The separation of variables 4>(p, ip, z ) = R(p)S(tp)Z(z) for Laplace’s equation 
V 2( t> = 0 yields the following three ODEs [see Equation (22.14) noting that 
A = 0]. In what follows, we shall use A for Ai: 


d f dR\ I 
Tp\ p Tp) + \ 
d 2s 

— -//£ = 0 , 

dip 


A p H—- I R — 0, 

P, 


d 2 Z 

dz 2 


-XZ = 0. 


(27.1) 
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The dependence 
of the solution on 
if is dictated by 
physical 
conditions. 


Let us concentrate on the second equation whose most general solution we 
can write as 


S{ip) 


Ae^ v + if p ^ 0, 

C(f + D if p = 0. 


(27.2) 


No matter what type of boundary conditions are imposed on the potential <f>, 
it must give the same value at ip and at ip + 2 tt while keeping the other two 
variables fixed. 1 This is because ( p, ip, z) and {p, y? + 27t, z) represent the same 
physical point in space. It follows that 


R{p)S(ip)Z{z) = R(p)S((p + 2 tt)Z(z) => S(<p + 2 tt) = S(ip) 


because the identity holds for all values of p and z. If the last relation is to 
be true for the case of p = 0, we must have C = 0 and S{ip) = D. For p yf 0, 
Equation (27.2) yields 

A e Vu O+2-ir) _|_ 0+2*-) _ ylgVF V _j_ Be~^ v 


or 

_ l) + Be-v^g-Vi^ _ ]_) = 0 . 

This must hold for all ip. The only way that can happen (we want to keep A 
and B nonzero) is to have 

e v7^ _ x = o and g-v^ _ 1 = Q 

both of which are equivalent to e'R 12 ^ = l. 2 If we confine ourselves to real 
p, we get only trivial solutions. To avoid this, we have to have sjjl = im. for 

to = 0, ±1, ±2,... or p = —to 2 for to = 0, ±1, ±2,-With this choice of p, 

the DE for S(<p) becomes S" + m 2 S = 0 whose general solution is a sum of 
trigonometric functions. We summarize this finding: 

Theorem 27.1.1. For all physical problems for which the azimuthal angle 
varies between 0 and 2 tt, one is forced to restrict the value of p to the negative 
of the square of an integer. The solution for the angular part then becomes 

S(ip) = A m cos nup + B m sin imp, to = 0,1,2,..., (27.3) 

where A m and B m are constants that may differ for different to ’s. 

The negative values of to will not give rise to any new solutions, so they 
are not included in the range of to. The case of p = 0 need not be treated 
separately, because the acceptable solution for this case is S = D = const., 
which is what is obtained in (27.3) when to. = 0. 

1 TIi is argument is valid only for physical situations defined for the entire range of ip. 
If the region of interest restricts p to a subset of the interval [0, 2ir], the argument breaks 
down. 

2 The second equation can be obtained by multiplying the first equation by e~. 
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The DE for Z(z) is independent of ?n and has an exponential solution 
if A > 0 and a trigonometric solution if A < 0. Assuming the former, and 
writing A = Z 2 , we have 


Z(z) = Ae lz + Be~ lz . (27.4) 

Least familiar is the radial DE which, in terms of l = y/X, can be rewritten 


as 


d 2 R 1 dR 


dp 2 p dp 


m 


-~^+ l - 7T ) R = 0. 


(27.5) 


Furthermore, if we define the variable v = Ip , we can cast (27.5) in the form 


d 2 R 1 dR 


dv 2 


dv 


+ 1 - 


m 


R = 0. 


(27.6) 


Equation (27.5), or (27.6), is one of the most famous DEs of mathematical 
physics called the Bessel differential equation. Our task for the remainder 
of this chapter is to find solutions of this DE and list some of their properties 
and examples of their usage. 


Friedrich Wilhelm Bessel showed no signs of unusual academic ability in school, 
although he did show a liking for mathematics and physics. He left school intending 
to become a merchant’s apprentice, a desire that soon materialized with a seven-year 
unpaid apprenticeship with a large mercantile firm in Bremen. The young Bessel 
proved so adept at accounting and calculation that he was granted a small salary, 
with raises, after only the first year. An interest in foreign trade led Bessel to study 
geography and languages at night, astonishingly learning to read and write English 
in only three months. He also studied navigation in order to qualify as a cargo officer 
aboard ship, but his innate curiosity soon compelled him to investigate astronomy 
at a more fundamental level. Still serving his apprenticeship, Bessel learned to 
observe the positions of stars with sufficient accuracy to determine the longitude 
of Bremen, checking his results against professional astronomical journals. He then 
tackled the more formidable problem of determining the orbit of Halley’s comet 
from published observations. After seeing the close agreement between Bessel’s 
calculations and those of Halley, the German astronomer Olbers encouraged Bessel 
to improve his already impressive work with more observations. The improved 
calculations, an achievement tantamount to a modern doctoral dissertation, were 
published with Olbers’s recommendation. Bessel later received appointments with 
increasing authority at observatories near Bremen and in Konigsberg, the latter 
position being accompanied by a professorship. (The title of doctor, required for the 
professorship, was granted by the University of Gottingen on the recommendation 
of Gauss.) 

Bessel proved himself an excellent observational astronomer. His careful mea¬ 
surements coupled with his mathematical aptitude allowed him to produce accurate 
positions for a number of previously mapped stars, taking account of instrumental 
effects, atmospheric refraction, and the position and motion of the observation site. 
In 1820 he determined the position of the vernal equinox accurate to 0.01 second, in 
agreement with modern values. His observation of the variation of the proper motion 


Bessel differential 
equation 



Friedrich Wilhelm 
Bessel 1784-1846 
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of the stars Sirius and Procyon led him to posit the existence of nearby, large, low- 
luminosity stars called dark companions. Between 1821 and 1833 he catalogued the 
positions of about 75,000 stars, publishing his measurements in detail. One of his 
most important contributions to astronomy was the determination of the distance 
to a star using parallax. This method uses triangulation, or the determination of 
the apparent positions of a distant object viewed from two points a known distance 
apart, in this case two diametrically opposed points of the Earth’s orbit. The angle 
subtended by the baseline of the Earth’s orbit, viewed from the star’s perspective, 
is known as the star’s parallax. Before Bessel’s measurement, stars were assumed 
to be so distant that their parallaxes were too small to measure, and it was further 
assumed that bright stars (thought to be nearer) would have the largest parallax. 
Bessel correctly reasoned that stars with large proper motions were more likely to 
be nearby ones and selected such a star, 61 Cygni, for his historic measurement. His 
measured parallax for that star differs by less than 8% from the currently accepted 
value. 

Given such an impressive record in astronomy, it seems only fitting that the 
famous functions that bear Bessel’s name grew out of his investigations of pertur¬ 
bations in planetary systems. He showed that such perturbations could be divided 
into two effects and treated separately: the obvious direct attraction due to the 
perturbing planet and an indirect effect caused by the Sun’s response to the per- 
turber’s force. The so-called Bessel functions then appear as coefficients in the series 
treatment of the indirect perturbation. Although special cases of Bessel functions 
were discovered by Bernoulli, Euler, and Lagrange the systematic treatment by Bessel 
clearly established his preeminence, a fitting tribute to the creator of the most fa¬ 
mous functions in mathematical physics. 


27.2 Solutions of the Bessel DE 


The Frobenius method is an effective way of finding solutions for ODEs. We 
rewrite (27.6) by multiplying it by v 2 to turn all its coefficients into polyno¬ 
mials as suggested by Equation (26.7). This yields 

v 2 d^R + v ^_ < y 2 _ m 2 \ r _ q (27.7) 

dv dv y ' 

Since v 2 vanishes at v = 0, we must assume a solution of the form 


R(v) =v s ^2 °kV k = ^2 c k v k+s 
k—0 k =0 

from which we obtain 


v ^=jb c k( k + s )v k+s , 

k =0 

d °° 

v 2 -—~ = y Cfe(fc + s)(k + s — l)v k+s . 
dv 
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Substituting these as well as ( v 2 — to 2 ) YlkL o c kV k+s in the DE yields 

OO OO 

y Ck [k + s+{k + s)(k + s - 1) -m 2 ]v k+s + y c k v k+s+2 = 0. 

^ -' 

To find the recursion relation, we need to have the same power of v in the 
sum. We do this by rewriting the first sum as 

OO 

co(s 2 — m 2 )v s + ci[(s + l) 2 — m 2 ]v s+1 + ^ c k [(k + s ) 2 — m 2 ]v k+s 

k =2 

= co(s 2 — m 2 )v s + ci[(s + l) 2 — m 2 ]v s+1 

OO 

+ c „+ 2 [(n + 2 + s) 2 — m 2 ]v n+2+s , 

n =0 


where in the second line, we introduced n = k — 2. Since n is a dummy index, 
we can change it back to k. It then follows that 

c 0 (s 2 — m 2 )v s + ci[(s + l) 2 — m 2 ]v s+1 

OO 

+ ^2 {cfc+2 [(k + 2 + s) 2 - TO 2 ] + Cfe} v k+2+s = 0. 
fc= 0 


Assuming that Co yf 0 and setting the coefficients of all powers of v equal to 
zero, we get 


s 2 = to 2 , ci[(s + l) 2 — to 2 ] = 0, 
c fc+2 [(^ + 2 + s) 2 — TO 2 ] + Cfc = 0. 


The first equation gives to = ±s. Inserting this in the second equation gives 
Ci(2s+1) = 0 => Ci = 0 or s = — 

The choice s = — \ gives m = -which is not acceptable, 3 as we decided 
that to is to be a positive integer. We therefore conclude that s = ±m and 
Ci = 0. It follows from the recursion relation that all odd c’s are zero. The 
Frobenius series will therefore look like 


R(v) = v ±Tn yc 2 kV 2k , 

k -0 


C2fc+2 _ _1_ 

C 2 fc (2 k + 2 + s) 2 — to 2 


(27.8) 


The ratio test for the convergence of series yields 


lim 

k—> oo 


^ „,2fc+2 

C2k+2V 

= lim 

l 

c 2k v 2k 

(2k + 2 + s) 2 — to 2 

k —»oo 


3 Actually, problems arising from other areas of physics beyond electrostatics and steady- 
state heat transfer allow noninteger values of m. However, we shall not deal with such 
problems here. 
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recursion 
for Bessel 
equation 


which indicates that 


Box 27.2.1. The series of Equation (27.8) is convergent for all values 
ofv. 


relation 


We now use the recursion relation to obtain the coefficients of expansion. 
Rewrite the recursion relation as 

1 _ 1 

Ck+2 “ ~(fc + 2 + s) 2 -s 2Cfe “ ~(k + 2)(2s + k + 2) Ck ’ 
where we substituted s 2 for to 2 . This gives 

C2 “ _ 2(2s + 2) C °’ 

° 4 ~ 4(2 s + 4) C2 ~ 

C6 = _ 6(2s + 6) C4 = 

and, in general, 

/ -t 

C2k = ^ ' 2k • (2fc - 2)... 2 (2s + 2k) [2s + (2k -2)]... (2s + 2) C °' 

N. ✓ 

V V 

—2 k k\ =2 k (s-\-k)(s-\-k— l)...(s+l) 

Multiplying the numerator and denominator by s!, we obtain 

< 2M) 

Substituting (27.9) in (27.8) yields 


/ -i \ 2_2_ 1 

1 1 4(2s + 4)2(2s + 2) °’ 

r n 3 1 1 1 

( ’ 6(2s + 6)4(2s + 4)2(2s + 2) C °’ 


R(v) = cos!u s ^ 

k —0 


2 2k k\(s + k)l 


„2k 


- !2 *©‘E 


k—0 


(~i) fc 

k!(s + k)l 



where we substituted s for ±to. in the exponent of v outside the summation. 
We also absorbed the powers of 2 in the denominator of the sum into the 
powers of v, and outside the sum, we multiplied and divided by 2 s . It is 
customary to choose the arbitrary constant Co to be equal to l/(s!2 s ). This 
leads to 


Box 27.2.2. The Bessel function of order s is denoted by J s and is 

given by the series 


'•(■Mi )‘t 


k=0 


k!(s + k)\ 



which is convergent for all values of x. 


(27.10) 
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Although Equation (27.10) was derived assuming that m —and therefore 
s —was an integer, lifting this restriction will still yield a series which is con¬ 
vergent everywhere, and one can define Bessel functions whose orders are 
real or even complex numbers. The only difficulty is to correctly interpret 
(s + n)\ for non-integer s. But this is precisely what the gamma function was 
invented for (see Definition 11.1.1). Thus, we let Equation (27.10) stand for 
Bessel functions of all orders. 


27.3 Second Solution of the Bessel DE 


As in the case of Legendre polynomials, we can obtain a second solution of 
the Bessel DE using Equation (24.6). For the Bessel DE, we have p(x) = 1/x. 
Using J m {x) as our input, we can generate another solution. With C = 0 in 
Equation (24.6), we obtain 


f x 1 

Zm(x) A Jm(x) I j2 / \ exp 



du — A m T m (x) 



du 

uJUny 


where A m = Kc and a are arbitrary constants determined by convention. 
Note that, contrary to J m (x), Z m (x) is not well behaved at x = 0 due to the 
presence of u in the denominator of the integrand. 

Although the above procedure manufactures a second solution for the 
Bessel DE, it is not the customary procedure. It turns out that for non¬ 
integer s, the Bessel function J_ s {x) is independent of J s (x) and can be used 
as a second solution. 4 However, a more common second solution is the linear 
combination 


Y s {x) = 


J s {x) COS S7T — J_ s (x) 


(27.11) 


Sin S7T 

called the Bessel function of the second kind, or the Neumann function. 
For integer s the function is indeterminate because of Equation (11.32) and 
the identity cosn7r = (—1)”. Therefore, we use l’Hopital’s rule and define 


Y n (x) = lim Y s (x) = lim 


3 

%-[J s {x) COS S7T - J- S {x )] 


Us 


7r cos nn 


= — lim 

7r s—>n 


ds 1 ’ ds 


From (27.10) we obtain 


dJ 1 _ 

ds 


(*)>» (!)-(!)'b-f 


k -0 


T(s + fc+ 1) 
k\T(s + k + 1 ) 



where 

t / n d ... d , ^, dT(x)/dx 

s di ln|(l - 1)!| = s hir(l) = -fT' 

4 See Hassani, S. Mathematical Physics: A Modem Introduction to Its Foundations , 
Springer-Verlag, 1999, Chapter 14 for details. 


Equation (27.10) 
is valid not only 
for integer s, but 
also for real and 
even complex s. 


Bessel function of 
the second kind or 
Neumann function 
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Similarly, 


dJ-s 

ds 


(§) + (§)-*!> D* 

k =0 


T(-s + k+ 1) 
k\T(-s + k + 1) 



Substituting these expressions in the definition of Y n (x) and using J_ n {x) = 
(—1 ) n J n (x) [Equation (11.32)], we obtain 


2 

7T 


(X\ 

1 


V 2 / 

_ _ | 

7 T 

12 / 


^(n + k + 1) 




ti “("+ 

k V(k-n+ 1) 


fc=o 


fc!r(fc — n + 1) 


k+ 1) 

(!)“ 


(f) 


2fe 


(27.12) 


It should be clear from (27.12) that the Neumann function Y s (x) is ill defined 
at x = 0, as expected of the second solution of the Bessel DE such as Z m (x) 
discussed above. 

Since Y s (x) is linearly independent of J a (x) for any s, integer or noninteger, 
it is convenient to consider { J s (x), Y s (x)} as a basis of solutions for the Bessel 
DE. In particular, the solution of the radial equation in cylindrical coordinates, 
i.e., the first equation in (27.1), becomes 


R{p) = AJ m (v) + BY m (v ) = AJ m {lp) + BY m (lp). (27.13) 


27.4 Properties of the Bessel Functions 

We have already considered some properties of the Bessel functions in Chap¬ 
ter 11. In this subsection, we quote those results and obtain other useful 
properties of the Bessel functions. 


27.4.1 Negative Integer Order 

Equation (11.32) gives a relation between a Bessel function of integer order 
and the Bessel function whose order is negative of the first one 

J-m(x) = (-1 ) m J m {x). (27.14) 

27.4.2 Recurrence Relations 

A number of recurrence relations involving Bessel functions of integer orders 
and their derivatives were derived in Chapter 11 which we reproduce here. 
The first one, involving no derivatives is 
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The second one, which includes derivatives of Bessel functions, is 

— J m+ i(x) = 2J' m {x). (27.16) 

Combining these two equations, one obtains 


m 


Jm— 1 (*e) — Jm(pc) H - Jm (^) i 


X 

m 


X 


(27.17) 


We can use these equations to obtain new—and more useful—relations. 
For example, by differentiating x m J m (x), we get 

[x m Jm{x)}' = mx m ~ l J m {x) + x m J' m {x) 

■ 777 

= X m — J m (x) + J'm{x) =X m J m -i(x). 
lx J 

s. v ✓ 

=Jm-i(x) by (27.17) 

Integrating (really, antidifferentiating) this equation yields 

J X m Jm- 1 {X)dx = X m Jm{ x). 

Similarly, the reader may check that 


J X m J m+ l(x) dx = -X m Jm(x). 


(27.18) 


(27.19) 


27.4.3 Orthogonality 

Bessel functions satisfy an orthogonality relation similar to that of the Leg¬ 
endre polynomials. However, unlike Legendre polynomials, the quantity that 
determines the orthogonality of different Bessel functions is not the order but 
a parameter in their argument (also see Example 24.5.3). 

Consider two solutions of the Bessel DE corresponding to the same az¬ 
imuthal parameter, but with different radial parameter. More specifically, let 
f{p) = Jm(kp) and g(p) = J m {lp)- Then 


ll + l -V + ( k 2 

dp 2 p dp V 
d 2 g , 1 rig , (,2 
dp 2 p dp V 



The reader may check that by multiplying the first equation by pg and the 
second equation by pf and subtracting, one gets 


recurrence 
relations involving 
derivatives 


- gf')} = (k 2 - l 2 )pfg , 
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where the prime indicates differentiation with respect to p. Now integrate 
this equation with respect to p from some initial value (say a) to some final 
value (say b) to obtain 


\p(fg' - 9f')t = (k 2 -1 2 ) f pf{p)g{p) dp. 

J a 

In all physical applications a and b can be chosen to make the LHS vanish. 
Then, substituting for / and g in terms of Bessel functions, we get 

f b 

( k 2 -l 2 ) / pJm(kp)Jm(lp) dp = 0. 

J a 

It follows that if k ^ l, then the integral vanishes, i.e., 

r b 


/ pJm(kp)J m (lp) dp = 0 if k^l. 


(27.20) 


This is the orthogonality relation for Bessel functions also derived in Example 
24.5.3. 

To complete the orthogonality relation, we must also address the case when 
k = l. This involves the evaluation of the integral f pJ 2 z (kp) dp , which, upon 
the change of variable x = kp, reduces to (f xJ^(x) dx)/k 2 . By integration 
by parts, we have 


1 = 



dx_= \x 2 J 2 ^{x) - 


J rn {x)J' rn {x)x 2 dx. 


In the last integral, substitute for x 2 J m (x) from the Bessel DE (27.6)—using 
x instead of v: 


X — m Jm(pd) ‘Hd rn (^x') X T m (x). 


Therefore, 


*=-(3*Vm(*)] 3 )' 

, f ' --— - -' 

I = ^x 2 J^{x) - I J' m {x)[m 2 Jm(x) -xJ’ m (x) - x 2 J^(x)} dx 


= ±x 2 J^(x) - m 2 J J m (x)J' m (x) dx + \ J (x 2 [J' m {x)] 2 ) dx 
= \x 2 J 2 m {x) - \rn 2 J 2 m (x) + \x 2 [J' m {x)] 2 . 

Returning back to p , we obtain the indefinite integral 

J pJ™(kp) dp = ^ i (p 2 - J^(kp) + ]^p 2 [J' m {kp)\ 2 . (27.21) 
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In most applications, the lower limit of integration is zero and the upper limit 
is a positive number a. The RHS of (27.21) vanishes at the lower limit because 
of the following reason. The first term vanishes at p = 0 because J m (0) = 0 
for all m > 0 as is evident from the series expansion (27.10). For m = 0 (and 
p = 0), the parentheses in the first term of (27.21) vanishes. So, the first 
term is zero for all m > 0 at the lower limit of integration. The second term 
vanishes due to the presence of p 2 . Thus, we obtain 

J q pJm(kp) dp =i (a 2 - J^(ka) + ^a 2 [J' m {ka )] 2 (27.22) 

for all m > 0 and, by (27.14), also for all negative integers. As mentioned 
earlier, we shall confine our discussion to Bessel functions of integer orders. 
It is customary to simplify the RHS of (27.22) by choosing k in such a way 
that J m (ka ) = 0, i.e., that ka is a root of the Bessel function of order m. In 
general, there are infinitely many roots. So, let x mn denote the nth root of 
J m (x). Then, 


ka = x mn k='^-H, n = 1 , 2 ,..., 
a 

and if we use Equation (27.17), we obtain 

pJm(.XmnP/n) dp — 2 CL [Jm+i(x mn ')] . (27.23) 

orthogonality 
relations involving 
Bessel functions 

Box 27.4.1. The Bessel functions of integer order satisfy the orthogonal¬ 
ity relations 

Jm{XmnPf n) Jrn^XmkP!n)p dp — {Xmn)dkn 5 (27.24) 

where a > 0 and x mn is the nth root of J m {x). 



Equations (27.20) and (27.23) can be combined into a single equation using 
the Kronecker delta: 

i-L 



27.4.4 Generating Function 

Just as in the case of Legendre polynomials, Bessel functions of integer order 
have a generating function, i.e., there exists a function g(x,t) such that 

OO 

g(x,t)= t n Jn{x )■ 

n =—oo 

To find g, start with the recurrence relation 

2777, 

Jm— 1 “1“ Jm-\-l{p^) = 

X 


(27.25) 
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multiply it by f m , and sum over all m to obtain 


]T t m J m - 1 (x) + J2 t m Jm+i(x) = - J2 (27.26) 


The first sum can be written as 


yy t m j m -i{x) = t yy t m i j m -i(x) = t yy t n j n (x) = tg(x,t), 


where we substituted the dummy index n = m — 1 for m. Similarly 

OO 1 OO 1 

yy t m j m+ i(x) = - yy t m+i j m+1 (x) = -g(x,t) 


2 ,m t ( \ j.m—1 t / \ 2 t ^9 

- y mt Jm(x) = — \ mt Jm(x) = —-^ 7 - 
x x X ot 

m =—oo m ——oo 

It follows from Equation (27.26) that 






where a: is assumed to be a constant because we have been differentiating with 
respect to t. Integrating both sides gives 


J | ( 1 + ^) dt = ln S , + ln<A(a:), 


«§(*-*) 

where the last term is the “constant” of integration. Thus, 


g(x, t) = <j>(x) exp 


To determine <^(x), we note that 

g(x,t) = = m jr f; 


/ n! ' m! 

n—0 m =0 


(-l) m /cc\ n + m . 

<p{x) yy —j—j-(-) t 

' n!m! \2/ 
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In the last double sum, collect all terms whose power of t is zero, and call the 
sum Sq. This is obtained by setting n = m. Then, 

S ° = 4>{x) jr, (f) = 4>(x)J 0 (x), 

71=0 


where we used Equation (27.10) with s = 0. But (27.25) shows that the 
collection of all terms whose power of t is zero is simply Jo{x). Thus, So = 
Jo(x), and <j>{x) = 1. This leads to the final form of the Bessel generating 
function: 


g(x, t) = exp 



5] t n J n {x). 


(27.27) 


Example 27.4.1. The generating function for Bessel functions can be used to 
obtain a useful identity. First we note that 


g(x + y,t) = g(x,t)g{y,t) 


as the reader may easily verify. Expanding each side gives 

OO OO OO OO OO 

Y t n J n (x + y)=Y t k Jk(x) Y tTnj m(y)=Y t k+rn Jk{x)Jm{y). 

n= — o o k= — oo m= — o o k= — oo m= — oo 


In the last double sum, let n = k + m, so that k = n — m. Since there is no limitation 
on the value of either of the dummy indices, the limits of the new indices n and m 
are still — oo and oo. Therefore, 

OO OO OO 

Y t n J„(x + y)= Y I" Jn-m(x) Jm(y) 

n= — oo n= — o o m= — oo 

oo / oo 

= (" Y Jn-m{x)Jm{y) 

n= — oo \m= — oo 


Since each power of t should have the same coefficient on both sides, we obtain the 
so-called addition theorem for Bessel functions: 

OO OO 

Jn{x + y)= Y J™-m{x) J m {y) = Y Jm{x)Jn-m{y), (27.28) 

m= — o o m= — o o 


where the last equality follows from the symmetry of J n (x + y) under the exchange 
of x and y. g 


The Bessel generating function can also lead to some very important iden¬ 
tities. In Equation (27.27), let t = e 10 and use (18.14) to obtain 


^ix sin 6 


Y e me Ux)- 


generating 
function for Bessel 
functions 


addition theorem 
for Bessel 
functions 


n=—oo 


(27.29) 




652 


Laplace’s Equation: Cylindrical Coordinates 


This is a Fourier series expansion in 9 —as given in (18.20)—whose coefficients 
are Bessel functions. To find these coefficients, we multiply both sides by 
e -im6 an( ] integrate from — 7 t to 7 t [see also Equation (18.22)]. The LHS 
gives 


LHS = 


e ix sing e -im6 dg : 


g £(x sin Q—m6) 


For the RHS, we obtain 


E 


e i(n-m)9 de 


J n (x) 


</ m (x), 


where we used the easily verifiable result [also see Equation (18.21)]: 


integral 

representation of 
the Bessel 
function 


Bessel's integral 


e i(n-m)9 d Q 


0 if n ^ m 
2n if n = m 


2ttS 


mn • 


Equating the RHS and the LHS, we obtain 

Jm(x) = — f e i{xsind - m6) d9. (27.30) 

2tT J-n 

The reader may check that this can be reduced to 

i r 

Jm(x) = — / cos(a;sin 9 — m6) d8 (27.31) 

71 Jo 

which is called Bessel’s integral. 

Bessel functions can be written in terms of the confluent hypergeometric 
function. To see this, substitute R(v) = v 11 e~ m ’f (v )—with /z and r/ to be 
determined —in Equation (27.6) to obtain 


d 2 f (2\i +1 \ df \ /i 2 — m 2 77(2/z+1) 2 

-4 + - 2 V -f + t —- JUt — L + n 2 + 1 

dv \ v J dv \_ v z v 

which, if we set fi = m and r\ = i, reduces to 

2m+l_ 2 A / ,_(2m+l)i / = 0 
V J V 



f = 0 


(27.32) 


relation between 
Bessel functions 
and confluent 
hypergeometric 
function 


Making the further substitution 2 iv = t, and multiplying out by t, we obtain 

+ ( 2m + 1 “ - ( TO + |)/ = 0 

which is in the form of (11.27) with a = m+ \ and 7 = 2m + 1. Thus, Bessel 
functions J m ( x) can be written as constant multiples of x m e~ lx 4 >(m+ 2 m+ 
1; 2 ix). In fact, 

Jm(x) = - —pr(-) e~ lx <5>(m +\,2m + l\2ix). (27.33) 

1 (m + 1 ) V 2 / 
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27.5 Expansions in Bessel Functions 


The orthogonality of Bessel functions can be useful in expanding other func¬ 
tions in terms of them. The basic idea is similar to the expansion in Fourier 
series and Legendre polynomials. If a function /(p) is defined in the interval 
(0, a), then we may write 

OO 

f(p) = Cn,Jm(%mnP /&) • (27.34) 

n= 1 


The coefficients can be found by multiplying both sides by pJm{xmkp/a) and 
integrating from zero to a. The reader may verify that this yields 


Cn — 


0,2 ^rn- 1-1 i x mn) 


f{p)J m {XmnP/a)pdp. 


(27.35) 


Equations (27.34) and (27.35) are the analogues of Equations (10.38), 
(10.40), (10.41), and (10.42) for Fourier series, and Equations (26.46) and 
(26.47) for Legendre polynomials. Like those sets of equations, they can be 
used to expand functions in terms of Bessel functions of a specific order. 

Example 27.5.1. The trigonometric functions can be expanded in Bessel func¬ 
tions with very little effort. In fact, Equation (27.29) leads immediately to 


OO 

ix \ ^ -n r / \ 

e = 2_^ 1 Jn(X) 

n= — oo 


or 

oo oo 

cos x + i sin x = E i 2k J 2 k(x) + E i 2k+1 J 2 k+i(x), 

k= — o o k= — oo 

where we have separated the even and odd sums. The first sum is real and the 
second sum pure imaginary. Therefore, 


expansion of sine 
and cosine in 
Bessel functions 


OO —1 OO 

COS X = E (~l) k J 2 k(x)= E (~l) k J 2 k(x) + Jo(x) + ^2(-l) k J2k( x )- 

k= — oo k= — o o k= 1 


The first sum can be written as 

— 1 OO OO 

E = E(- 1 )' fcj -^w - E(- 1 ) fc (- 1 ) 2fcj ^( a; ) 

k= — oo k= 1 k= 1 


oo 

= E(-!) fc .Mz) 

k =1 

which is identical to the last sum. It follows that 

oo 

cosx = Jo(x) + 2 y^(—l) k J 2 k(x). 

k= 1 

Similarly, 

OO 

sin * = 2^(-l) t J a+ i(x) 
k =o 


(27.36) 


(27.37) 


as the reader is urged to verify. 
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Parseval relation 


expansion of p k in 
terms of Bessel 
functions 


If we square Equation (27.34), multiply by p , and integrate from zero to 
a, we obtain 


oo oo pd 

f 2 (p)pdp = y2 V'cnCfc / Jm(x mn p/a)Jm{x m kp/a)pdp . 
Jo 


n-1 fc= 1 


= \a 2 Jm + i( x ™,n)6kn by (27.24) 

This leads to the so-called Parseval relation: 


f 2 {p)pdp = \a 2 clJm+l{Xmn) 


(27.38) 


for some m. This m can be chosen to make the integrations as simple as 
possible. 

Example 27.5.2. Let us find the expansion of p k in terms of Bessel functions. 
Equations (27.35) and (27.18) suggest expanding in terms of Jk{x) because the 
integrals can be performed. Therefore, we write 


P — ^ ) CnJk (Xknp/oJ) , 


where 


2 t 2 " / \ [ p k Jk(xknp/a)pdp = 2 f p k+1 Jk(xknp/a)dp. 

J k+l\ Xkn > Jo a J k+l\ Xkn ) Jo 


Introducing y = Xknp/a in the integral gives 


2a fe 


■f 


y k+1 Jk{y)dy = 


2 a k 


%knJk -\-1 ( %kn ) 


x k + 2 p ( x , ) 

X kn J k+l\ x kn) 

where we used (27.18) with m replaced by k + 1. Thus, we have 

Jk ijEknP/&) 


p k = 2a k Y 


%knJk -\-1 i%kn) 


27.6 Physical Examples 

Our discussion of Laplace’s equation has led us to believe that trigonometric 
functions and Legendre polynomials are, respectively, the “natural” functions 
of Cartesian and spherical geometries. It is of no surprise now to learn that 
Bessel functions are the natural functions of cylindrical geometry. 

As in the case of Cartesian and spherical coordinates, unless the symmetry 
of the problem simplifies the situation, the separation of Laplace’s equation 
results in two parameters leading to a double sum as in Example 25.2.4. The 
reason that we did not obtain double sums in spherical coordinates is that 
from the very beginning we assumed azimuthal symmetry. Thus, we expect 
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Figure 27.1: A conducting cylindrical can whose top has a potential given by V(p,9) 
with the rest of the surface grounded. 

a double summation in the most general solution of Laplace’s equation in 
cylindrical geometries. One of these sums is over m which, as Equation (27.3) 
shows, appears in the argument of the sine and cosine functions. It also 
designates the order of the Bessel (or Neumann) function. 

To understand the origin of the second summation, consider a cylindrical 
conducting can of radius a and height h (see Figure 27.1). Suppose that the 
potential at the top face varies as V ( p , ip) while the lateral surface and the 
bottom face are grounded. Let us find the electrostatic potential $ at all 
points inside the can. 

The general solution is a product of (27.3), (27.4), and (27.13): 

$(P>¥>)Z) = R(p)S(<p)Z(z). 

Since 4>(p, p, 0) = 0 for arbitrary p and <p, we must have Z{ 0) = 0 yielding—to 
within a constant— Z(z) = sinh(Zz). 

Since 4 ) (0, p, z) is finite, no Neumann function is allowed in the expan¬ 
sion, and, to within a constant, we have R{p) = J m {lp). Furthermore, since 
4>(a, tp, z) = 0 for arbitrary p and z, we must have 

R(a) = J m (la) = 0 => la = x mn => l = mn , n=l,2, 

a 

where, as before, x mn is the nth root of J m . 

We can now multiply R, S, and Z and sum over all possible values of m 
and n, keeping in mind that negative values of m give terms that are linearly 
dependent on the corresponding positive values. The result is the so-called 
Fourier—Bessel series: Fourier-Bessel 

OO OO 

<F(p, ip, z) = sinh z ^j ( A mn cos mp + B mn sin mp) 

771=0 71=1 

(27.39) 


series 
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where A mn and B mn are constants to be determined by the remaining bound¬ 
ary condition which states that <1>(p, ip,h) = V(p, (f) or 


V(p, ip) = EE ./to (~^p) sinl1 mn cosmi P + Bmn sin rmp). 

m—0 n= 1 

(27.40) 

Multiplying both sides by pJ m {x m ka/p) cos jtp and integrating from zero to 
27 t in ip, and from zero to a in p gives Ajk- Changing cosine to sine and 
following the same steps yields Bjk . Switching back to m and n, the reader 
may verify that 


A 


Bm.n. — 


2 f dp f dp pV(p, ip) J m ( mn p) cos rmp 

J o _ Jo _ ^ a ' _ 

Tra 2 J^^Xmn) sinh (x mn h/a) 

2 [ dp [ dp pV (p, ip)J m (-^-p) sin mip 

Jo _ Jo _ v a _ 

to 2 Jm + i(x mn ) sinh (x mn h/a) 


(27.41) 


where we have used Equation (27.24). 

The important case of azimuthal symmetry requires special consideration. 
In such a case, the potential of the top surface V(p, p>) must be independent 
of ip. Furthermore, since S(ip) is constant, 5 its derivative must vanish. Hence, 
the second equation in (27.1) yields p, = — m 2 = 0. This zero value for m 
reduces the double summation of (27.39) to a single sum, and we get 


OO 

<F(p, z) = ^2 An Jo (—p) sinh f—z) . (27.42) 

\ CL J ' CL / 

n =1 


The coefficients A n can be obtained by setting m = 0 in the first equation of 
(27.41): 

An = a 2 J?(x 0 n)smh(xonh/a ) J 0 pV{p)J ° (~ P ) d(> ’ (27 ’ 43) 

where V(p) is the ^-independent potential of the top surface. 

Example 27.6.1. Suppose that the top face of a conducting cylindrical can is 
held at the constant potential Vo while the lateral surface and the bottom face are 
grounded. We want to find the electrostatic potential at all points inside the can. 

Since the potential of the top is independent of ip, azimuthal symmetry prevails, 
and Equation (27.43) gives 

A _ __ f ° pj o / x ° n p\ ( ip — _ 4Vp _ 

a 2 Jf(xon)sinh(xonh/a) J 0 \ a xo n Ji(xon) smh(xo n h/a) ’ 

where we used Equation (27.18). The detail of calculating the integral is left as 
Problem 27.15 for the reader. Therefore, 


$(p,z)=4V 0 J2 

n= 1 


Jo(xonp/a) sinh(xo n z/a) 

Xon J i (xon) sinh(xo „h/a )' ■ 


5 S(ip) must be a constant. Otherwise, the potential would depend on ip. 
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27.7 Problems 

27.1. Derive (27.6) from the first equation of (27.1). 

27.2. Show that both equations in (27.17) give 

4(x) = 

27.3. Show that 

x)]' = -x~ m J m+ i(x) 

and derive Equation (27.19). 

27.4. Obtain the following equation from the Bessel DE: 

d 


dp 


[p(fg' - gf')\ = (k -i )pfg , 


where / and g are solutions of two Bessel DEs for which the “constants” of 
the DEs are k 2 and l 2 , respectively. 

27.5. (a) Show that for the Bessel generating function, 

1 

g(x + y,t) = g(x,t)g(y,t) and g(x,-t) = 


g{x,t) 


(b) Use the second relation to show that 


^ ' Jm—ki.x')Jm{x) — (fofc — 


1 if k = 0, 
0 if k + 0. 


Hint: Set the powers of t equal on both sides of 1 = g(x, t)g(x, — t). 
(c) In particular, 

OO OO 

i= E = + 


showing that | Jo(a:)| < 1 and | Jm(x)\ < l/-\/2 for m > 0. 

27.6. Derive Equation (27.31) from (27.30). 

27.7. Use Equation (27.31) to show that J_ m = (—1 ) m J m . 

27.8. Show that the substitution R(v) = v m e~ lv f(v) turns Equation (27.6) 
into (27.32). 

27.9. Using the orthogonality of Bessel functions derive Equation (27.35) 
from (27.34). 


27.10. Prove that 


e ixcos9 _ i n e in0 J n {x). 


n=—oo 
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27.11. Derive the expansion of the sine function in terms of Bessel functions. 
Hint: See Example 27.5.1. 

27.12. The integral / 0 °° e~ ax Jo(bx) dx may look intimidating, but leads to a 
very simple expression. To see this: 

(a) Substitute for Jo(bx') its series representation, and express the result of 
the integration in terms of the gamma function (a factorial, in this case). 

(b) Use one of the results of Problem 11.1 to show that 


e ax Jo(bx)dx 


_lfr(n+i) 

n '- 



(c) Show that this result can also be expressed in terms of the hypergeometric 
function: 

r°° i / h 2 \ 

J e~ ax Jo(bx) dx = -F U, 1; 1; - —j . 

(d) Now use the result of Problem 11.4 to express the integral in a very simple 
form. 


27.13. By writing the series representation of the Bessel function as in the 
previous problem, and using the result of Problem 11.2, show that for integer 

TO 


e ax J m (bx)dx 


1 

a^/ir 
■ F 


m m r(m/2 + i)r((TO + i)/2) 

\ay T(m. + 1) 



m +1 
2 


; to + 1 ; 



27.14. Multiply both sides of Equation (27.40) by pJm{x m ka/p) cos jip and 
integrate appropriately to obtain Aji.. Switch cosine to sine and do the same 
to find Bjk- 

27.15. Use Equation (27.18) to show that 


f pJo('-p) dp = — Ji{x 0n ). 

Jo V a / x 0n 

27.16. Use the Parseval relation (27.38) for f(p) = p k to obtain 


E 


i 


i 

4(to + 1) 


for any to. Hint: See Example 27.5.2. 

27.17. A long heat conducting cylinder of radius a is composed of two halves 
(with semicircular cross sections) with an infinitesimal gap between them. 
The upper and lower halves of the cylinder are in contact with heat baths of 
temperatures +Tq and —To, respectively. Find the temperature both inside 
and outside the cylinder. 
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27.18. A long heat conducting cylinder of radius a is composed of two halves 
(with semicircular cross sections) with an infinitesimal gap between them. 
The upper and lower halves of the cylinder are in contact with heat baths 
of temperatures +T\ and — T), respectively. The cylinder is inside a larger 
cylinder (and coaxial with it) held at temperature T 2 . Find the temperature 
inside the inner cylinder, between the two cylinders, and outside the outer 
cylinder. 

27.19. A long conducting cylinder of radius a is kept at potential V\. The 
cylinder is inside a larger cylinder (and coaxial with it) held at potential V 2 . 
Find the potential inside the inner cylinder, between the two cylinders, and 
outside the outer cylinder. 




Chapter 28 

Other PDEs 

of Mathematical Physics 


Chapters 25, 26, and 27 discussed one of the most important PDEs of mathe¬ 
matical physics, Laplace’s equation. The techniques used in solving Laplace’s 
equation apply to all PDEs encountered in introductory physics. Since we 
have already spent a considerable amount of time on these techniques, we 
shall simply provide some illustrative examples of solving other PDEs. 


28.1 The Heat Equation 


The heat equation, sometimes also called the diffusion equation, was in- diffusion equation 
troduced in Chapter 22 [see Equation (22.3)]. The separation of variables 
T(t, r) = g(t)R( r) yields 

[g(t)R{r)] = k 2 V 2 [g(t)R{ r)] => fl(r)^ = k 2 g(t)\7 2 R. 

Dividing both sides by g{t)R{ r), we obtain 


1 dg_ 
g dt 



The LHS is a function of t, and the RHS a function of r. The independence of 
these variables forces each side to be a constant. Calling this constant — k 2 X 
for later convenience, we obtain an ODE in time and a PDE in the remaining 
variables: 

c Il + k 2 Xg=0 and \7 2 R + \R = 0. (28.1) 

The general solution of the first equation is 

-k 2 \t 


g(t) = Ae 


(28.2) 
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A of the heat 
equation is always 
positive 


heat-conducting 

rod 


and that of the second equation can be obtained precisely by the methods of 
the last chapter. We illustrate this by some examples, but first we need to 
keep in mind that A is to be assumed positive, otherwise the exponential in 
Equation (28.2) will cause a growth of g(t) (and, therefore, the temperature) 
beyond bounds. 


28.1.1 Heat-Conducting Rod 

Let us consider a one-dimensional conducting rod with one end at the origin 
x = 0 and the other at x = b. The two ends are held at T = 0. Initially, 
at t = 0, we assume a temperature distribution on the rod given by some 
function f(x). We want to calculate the temperature at time t at any point 
x on the rod. 

Due to the one-dimensionality of the rod, the y- and ^-dependence can be 
ignored, and the Laplacian is reduced to a second derivative in x. Thus, the 
second equation in (28.1) becomes 

d 2 X 

__ + AX = 0, (28.3) 

ax 

where X is a function of x alone. The general solution of this equation is 1 
X{x) = B cos{V\x) + Csin(-\/Ax). 

Since the two ends of the rod are held at T = 0, we have the boundary 
conditions T(t, 0) = 0 = T(t,b), which imply that X(0) = 0 = X(b). These 
give B = 0 and 2 

sin(\/A&)=0 VXb = mr for n = l,2,_ 


With a label n attached to A, the solution, and the constant multiplying it, 
we can now write 

I l i (subscripted) solution of the time equation is also simply obtained: 

9n(t) = A n e~ k2 ^^ 2t . 


/ nir \ 2 


/ 717T \ 

(t) 

and 

X n (x) = C n sin y— x J 


This leads to a general solution of the form 

OO 

T(t, x) = J2 B n e~^ k/b)2t sin , (28.4) 

n—1 

where B n = A n C n . The initial condition f(x) = T{ 0,x) yields 

OO 

/(*) = H B n sm(mrx/b) 

n—1 

1 The reader may check that the only solution for A = 0 is the trivial solution. 

2 Consult Section 25.2. 
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which is a Fourier series from which we can calculate the coefficients 


B 


n — 



dx. 


Thus if we know the initial temperature distribution on the rod [the function 
f(x)], we can determine the temperature of the rod for all time. For instance, 
if the initial temperature distribution of the rod is uniform, say To, then 


B 


n 



2To 

mr 


[!-(-!)"]■ 


It follows that the odd n’s survive, and if we set n = 2 m + 1, we obtain 


T?2m+1 — 


4Tb 

7r(2 m + 1) 


and 


4T n “ e -[(2m+l)Trk/b] 2 t 

T{t, x) = - Y --——-sin 

7r 2m + 1 

771=0 


(2m + l)7r 


This distribution of temperature for all time can be obtained numerically for 
any heat conductor whose k is known. Note that the exponential in the sum 
causes the temperature to drop to zero (the fixed temperature of its two end 
points) eventually. This conclusion is independent of the initial temperature 
distribution of the rod as Equation (28.4) indicates. 


28.1.2 Heat Conduction in a Rectangular Plate 

As a more complicated example involving a second spatial variable, consider 
a rectangular heat-conducting plate with sides of length a and b all held at 
T = 0. Assume that at time t = 0 the temperature has a distribution function 
f{x,y). Let us find the variation of temperature for all points (x,y) at all 
times t > 0. 

The spatial part of the heat equation for this problem is 


d 2 R 
dx 2 


d 2 R 
dy 2 


+ XR = 0. 


A separation of variables, R(x,y) = X(x)Y(y), and its usual procedure leads 
to the following equation: 


1 d 2 X 
X dx 2 



+ A = 0. 


This leads to the following two ODEs: 


d 2 X 

dx 2 


+ yiX — 0 , 


d 2 Y 


+ vY = 0, 


A = fi + v. 


conduction of heat 
in a rectangular 
plate 
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circular plate 


Due to the periodicity of the BCs, the general solutions of these equations are 
trigonometric functions. The four boundary conditions 

T{ 0, y, t) = T(a, y, t) = T(x, 0, t) = T(x, b,t) = 0 


determine the specific form of the solutions as well as the indexed constants 

for n = 1,2,..., 
for m= 1 , 2 ,.... 


of separation: 




/mr\ 2 

and 

X n (x) = A n sin ^ 

TITT \ 

Pn = ( —) 

- X ) 

V a ) 


a ) 

/ mn \ 2 

and 

Y m {y) = B m sin ( 

' mn \ 

= (-) 

-y) 

So, A becomes a 

double indexed quantity: 



^ ^mn 

/ T17T 

\ 2 ( 


= T Vm — I 

V a 

) + C 


The solution to the g equation can be expressed as g(t) = C mn e k2Xm * t . 
Putting everything together, we obtain 


T(x, y, t) = Ari 


— kXmnt 


n=1 m= 1 


. / T17T \ . / miT \ 

sm (-x) sin (— y) 


where A mn = A n B m C mn is an arbitrary constant. To determine it, we impose 
the initial condition T{x,y 1 Q) = f(x,y). This yields 

OO OO 

f(x,y ) = ^2 {~ x ) sin {~Y y ) 

n—1 m= 1 

from which we find the coefficients ^4 mn (see Theorem 25.2.5): 


A 

m.n. 


ab 


dx f dyf(x, y) sin sin ■ 


28.1.3 Heat Conduction in a Circular Plate 


In this example, we consider a circular plate of radius a whose rim is held 
at T = 0 and whose initial surface temperature is characterized by a func¬ 
tion We are seeking the temperature distribution on the plate for 

all time. The spatial part of the heat equation in ^-independent cylindrical 
coordinates, 3 appropriate for a circular plate, is 


ld_ f dR\ 

pdp V dp ) 


1 d 2 R 

p 2 dp 2 


+ XR = 0 


3 See the discussion of Subsection 22.3. 
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which, after the separation of variables, R(p, <p) = 1R (p)S(p), reduces to 


S(p) = A cos rrup + B sin mp for m = 0,1,2,..., 


d 2 :R 1 <£R 

dp 2 p dp 



1R = 0. 


The solution of the last (Bessel) equation, which is well defined for p = 0 and 
vanishes at p = a, is 

IR(p) = CJ m (with \/A = anc [ n _ 1 , 2 ,..., 

V a / a 

where, as usual, x mn is the nth root of J m . We see that A is a double-indexed 
quantity. The time equation (28.2) has a solution of the form 


p(^) — Dmn& 


— kXrr 


= D n 


,-fc 2 04 n/a 2 )t 


Multiplying the three solutions and summing over the two indices yields the 
most general solution 

OO OO 

T(p, p, t) = ^2 X! Jm e ~ (kXmn/a)2t ( A mn cosmip + B mn sinmp). 

m—0 n= 1 

The coefficients are determined from the initial condition 

OO OO 

f(p,p) =T(p,ip, 0) = Jm (-^p) {Amn cos rrup + B mn sin rrup). 

m—0 n= 1 

Except for the hyperbolic sine term, this equation is identical to (27.40). 
Therefore, the coefficients are given by expressions similar to Equation (27.41). 
In the case at hand, we get 

<2 /* 27 t pci 

A mn — 2~r2 7~, T / / dp pf (p, p)J m 

7ra Jo Jo 

Bmn = — 9 t2 2 , - t f dip [ dppf(p,p)J m (—-p) sin mp. 

n a 2 Jm+i\Xmn) Jo Jo Vo/ 

In particular, if the initial temperature distribution is independent of p, 
then only the term with m = 0 contributes, 4 and we get 

OO 

T(p,t) = J2A n J 0 (^p) e-( fca W°) 2t . 

n—1 


- p) cos rrup, 


With /(p) representing the (p-independent initial temperature distribution, 
the coefficient A n is found to be 

Note that the temperature distribution does not develop any p dependence 
at later times. 


4 See the discussion after Equation (27.41). 
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time-independent 

Schrodinger 

equation 



Erwin Schrodinger 
1887-1961 


28.2 The Schrodinger Equation 

Chapter 22 separated the time part of the Schrodinger equation from its space 
part, and resulted in the following two equations: 

V 2 V> + \- E ~ ^( r )]V’ = 0 and ^ (28.5) 

where E, the energy of the quantum particle, is the constant of separation. 5 
We have also used ip instead of R, because the latter is usually reserved to 
denote a function of the radial variable r (or p) when separating the variables 
of the Laplacian in spherical (or cylindrical) coordinates. 

The solution of the time part is easily obtained: It is simply 

T(t) = Ae im/h = Ae iuit where w = (28.6) 

h 

It is the solution of the first equation in (28.5), the time-independent 
Schrodinger equation that will take up most of our time in this section. 

Erwin Schrodinger was a student at Vienna from 1906 and taught there for ten 
years from 1910 to 1920 with a break for military service in World War I. While 
at Vienna he worked on radioactivity, proving the statistical nature of radioactive 
decay. He also made important contributions to the kinetic theory of solids, studying 
the dynamics of crystal lattices. 

After leaving Vienna in 1920 he was appointed to a professorship in Jena, where 
he stayed for a short time. He then moved to Stuttgart, and later to Breslau before 
accepting the chair of theoretical physics at Zurich in late 1921. During these years 
of changing from one place to another, Schrodinger studied physiological optics, in 
particular the theory of color vision. 

Zurich was to be the place where Schrodinger made his most important contribu¬ 
tions. From 1921 he studied atomic structure. In 1924 he began to study quantum 
statistics soon after reading de Broglie’s thesis which was to have a major influence 
on his thinking. 

Schrodinger published very important work relating to wave mechanics and the 
general theory of relativity in a series of papers in 1926. Wave mechanics, proposed 
by Schrodinger in these papers, was the second formulation of quantum theory, the 
first being matrix mechanics due to Heisenberg. For this work Schrodinger was 
awarded the Nobel prize in 1933. 

Schrodinger went to Berlin in 1927 where he succeeded Planck as the chair of 
theoretical physics and he became a colleague of Einstein’s. 

Although he was a Catholic, Schrodinger decided in 1933 that he couldn’t live 
in a country in which the persecution of Jews had become a national policy. He left, 
spending time in Britain where he was at the University of Oxford from 1933 until 
1936. In 1936 he went to Austria and spent the years 1936-1938 in Graz. However, 
the advancing Nazi threat caught up with him again in Austria and he fled again, 
this time settling in Dublin, Ireland, in 1939. 

5 We used a in place of E in Chapter 22. 
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His study of Greek science and philosophy is summarized in Nature and the 
Greeks (1954) which he wrote while in Dublin. Another important book written 
during this period was What Is Life (1944) which led to progress in biology. He 
remained in Dublin until he retired in 1956 when he returned to Vienna. 

During his last few years Schrodinger remained interested in mathematical physics 
and continued to work on general relativity, unified field theory, and meson 
physics. 


28.2.1 Quantum Harmonic Oscillator 

quantum harmonic 
oscillator 


0 + |[ £ -V(, W = O, 

where E is the total energy of the particle. 

For a harmonic oscillator (with the “spring” constant k), 

V(x) = Tjhx 2 = ^yix 2 x 2 

and 

- ^-x 2 ip + = °’ w - 

To simplify the equation, we make the change of variables x = (\Jh/ytx)y. 
The equation then becomes 

2 E 

tp" - y 2 ^ +—ip = 0, (28.7) 

where the primes indicate differentiation with respect to y. 

We could solve this DE by the Frobenius power series method. However, 
tradition suggests that we first look at the behavior of the solution at y —» oo. 
In this limit, we can ignore the last term in (28.7), and the DE becomes 

ip” — y 2 ip ~ 0 

which can easily be shown to have (an approximate) solution of the form 
e ±y / 2 _ Since the positive exponent diverges at infinity, we have to retain 
only the solution with negative exponent. Following the traditional steps, 
we consider a solution of the form ip(y) = H(y) exp(— y 2 /2) in which the 
asymptotic function has been separated. Substitution of this separated form 
of ip in (28.7) results in 

2 E 

H" — 2 yH' + A H = 0 where A = —-1. 

rux 



As an important example of the Schrodinger equation, we consider a particle 
in a one-dimensional harmonic oscillator potential. 

The one-dimensional time-independent Schrodinger equation for a particle 
of mass y in a potential V(x) is 


( 28 . 8 ) 
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Hermite 

differential 

equation 


recursion relation 
for Hermite DE 


Physics dictates 
mathematics! 


This is the Hermite differential equation. 

To solve the Hermite DE by the Frobenius method, we assume an expan¬ 
sion of the form H(y) = 'Y^=o c ny n with 

OO OO 

H\y) = ^2 nc n y n ~ 1 = ^(n + l)c n+ ry n , 

n— 1 n— 0 

OO OO 

H"(y ) = ^2 n ( n + 1 ) c n+iy n ~ 1 = ^2( n + l )i n + 2 )c n+ 22 / n , 

n— 1 n—0 

where in the last step of each equation, we changed the dummy index to 
to = n — 1, and in the end, replaced to. with n. Substituting in Equation 
(28.8) gives 

OO OO 

^[(n + l)(n + 2 )c„ +2 + A c n ]y n -2 ^(n + l)c n+1 y n+1 = 0. (28.9) 

n—0 n= 0 

s -v-' 

=Si 

Now separate the zeroth term of the first sum to obtain 

OO 

§i = 2c 2 + Aco + 'y ( [(n. + l)(?i. + 2 )c n _|_2 + A c n \y n . 

n =1 

Changing the dummy index to m = n — 1 yields 

OO 

§i = 2 c 2 + Aco + ^2 [( m + 2)(to + 3)c m -)_3 + Ac m +i]y m+1 

m =0 

whose dummy index can be switched back to n. Substitution of this last result 
in (28.9) now yields 


2c 2 + Aco + y ' [(n + 2)(n + 3)c ra _|_3 + Ac n +i — 2(n + l)c n +i]y n+1 — 0. 

n=0 


Setting the coefficients of powers of y equal to zero, we obtain 



2(?r + 1) — A 

c n +3 = 7 / : yrCn-i-i 

(n + 2 )(n + 3) 


for n > 0, 


or, replacing n with n — 1 and noting that the resulting recursion relation is 
true for n = 0 as well, we obtain 


2n — A 

° n+2 = (n + l)(n + 2) 


n > 0. 


(28.10) 


The ratio test yields easily that the series is convergent for all values of y. 
However, on physical grounds, i.e., the demand that lim = 0, the 
series must be truncated. Let us see why. 
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Construction of Hermite Polynomials 

The fact that we are interested in the behavior of ip (and therefore, H) as x 
(or y) goes to infinity permits us to concentrate on the very large powers of y 
in the series for H(y). Hence, separating the even and odd parts of the series, 
we may write 


H{y) = £ 


.2 k 


C2kU^ + C2k+lV 

k=0 k =0 


2fc+l 


— P2M+l(y) + ^ 


2k 


C2kU + C2k+iy 

k=M+l k=M+l 


E C 2k+W 2k+1 (28.11) 

1 

oo 

2H2M+2 | ^ ni 2k+2M+3 

+ 2^ C2fc + 2M +32/ 5 


= P2M+1 (y) + E c 2fc+2M+22/“ 

k —0 fe=0 

where P 2 M+x{y) is a generic polynomial obtained by adding all the “small” 
powers of the series, and M is a very large number. Now note that for very 
large n, the recursion relation yields 

2 n 2n ~ 2 2 

c n +2 ~ 7 ; 777 r~o\ Cn ~ 7 T 7 \ Cn ~ — c " 

(n+l)(n + 2) (n)(n) n 

A few iterations give 


n — 2 


Cn—2’ 


(n — 2) (n — 4) • • • (n — 2/c) 


Cn—2k • 


In particular, 


c 2 fc+AT » + AT - 2)(2fc + IV - 4) • • • (IV) Cjv ’ ( 28 - 12 ) 

To find the coefficients in Equation (28.11), first let N = 2 M + 2 and obtain 

2 fc 


C2k+2M+2 


(2 k + 2M)(2k + 2M - 2) • • • (2M + 2) 
2 k 


C2M+2 


7C2M+2 


(28.13) 


[2{k + M)][2{k + M- 1)] • • • [2(Af + 1)] 

1 M! 

(fc + M)(k + M- 1) • • • (M + 1) C2M+2 “ (fc + M)\ C2M+2 ' 


Similarly 


C2fc+2M+3 


2 k (2M + 1)!! 


:C2Af+3 


(2fc + 2M +1)!! 

2 fc [2(M+ 1)]!/[2 m+1 (M+ 1)!] 

“ [2(fc + M + l)]!/[2 fc+M+1 (fc + M + 1)!] 
_ 0 2 k + l)]!(fc + M + 1)! 

(M+ l)![2(fc + M+ 1)]! ~' 2M+3 ’ 


C2Ar+3 


(28.14) 


In particular, M is very large compared to A of Equation (28.10). 
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where we used the result of Problem 11.1. By using the Stirling approximation 
(11.6), the reader may verify that 

(M + 1)! 

C2k+2M+3 « (k + M + 1)\ C2M+3 ' (28.15) 

With the coefficients given in terms of two constants (C 2 M +2 and C 2 M+ 3 ), 
Equation (28.11) becomes 

OO 0l»_|_0 1/f OO Ot_LO A/f _1_0 

H { y) = P 2 M+ 1 (V) + c 2 M+ 2 M \y 2 g + c 2 m +3 (M + 1)1, g (fc % + 1) , 

00 2 j 00 

= P 2 M +1 (y) + c 2 M+ 2 M\y 2 —-—f- C 2 M+ 3 (M + l)!t/ (28.16) 

j=M j=M +1 3' 

The first sum over j can be reexpressed as follows: 


OO o,- OO / M — i 9 . 

y' yz_ = y' (rP _ y- y_; 

/ o I / o I / ^ o I 

j—M J ' j =0 i =0 


Q2M—2 (y)) 


where Q 2 M- 2 (y) is a polynomial of degree 2 M — 2 in y. The second sum in 
(28.16) can be expressed similarly. Adding all the polynomials together, we 
finally get 

fj(y) « ? 2 M+i(y) + C 2 M+ 2 MI y 2 e v + C 2 m+z{M + 1)! ye v 

" -v-' --V-' 

=0M =OLM 

= IP 2 M +1 {y) + {&My + Pmh 2 )s v ■ (28.17) 


Let us now go back to ip(y) and note that 

iffy) = H{y)e~ v /2 « T 2 M+i(y)e _y /2 + ( a M y + (3Niy 2 )e v /2 

- -V-' --V - -' 

—»0 as y — > 00 —>-oo as y —> oo 


Truncation of the 
infinite series gives 
the quantization 
of harmonic 
oscillator energy 
levels. 


Hermite 

polynomials 


because any exponential decay outweighs any polynomial growth. It follows 
that, if H(y) is an infinite series, iffy) w M diverge at infinity. From a physical 
standpoint this means that the quantum particle inside the harmonic oscillator 
potential well has an infinite probability of being found at infinity! 7 

To avoid this unrealistic conclusion, we have to reexamine H (y) . The case 
of Legendre polynomials tells us that the infinite series needs to be truncated. 
This will take place only if the numerator of the recursion relation vanishes for 
some n, i.e., if A = 2 to for some integer m. An immediate consequence of such 
a truncation is the famous quantization of the harmonic oscillator energy: 

2m = A = —-1 =f> E = (to + i)tuv. 

nu> 

The polynomials obtained by truncating the infinite series are called the 
Hermite polynomials. We now construct them. With A = 2 to, the recursion 
relation (28.10) can be written as 

' The Copenhagen interpretation, the only valid interpretation of quantum mechan¬ 
ics, states that |i/’(a:)| 2 is the probability density for finding the particle at x. 
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2 (n — to — 2) 2 (to + 2 — n) 

n(n — 1) Cn 2 n(n — 1) C] 


to > n > 2. 


The upper limit for n is due to the truncation mentioned above. After a few 
iteration, the pattern will emerge and the reader may verify that 


Cn 




2 k (m + 2 — n)(m + 4 — n) ■ ■ • (to + 2k 
n(n — 1) • • • (n — 2k + 1) 


n) 

Cn—2k’ 


(28.18) 


We need to consider the two cases of even and odd n separately. For n = 2k, 
we get 


C2fc = (—l) fe 


2 k m(m 


2) • • • (to + 2 - 2fc) 


c 0 - 


(28.19) 


Now, since the numerator of (28.10) must vanish beyond some integer, and 
since 2 n = Ak, we must have A = Aj or to = A/2 = 2j for some integer j. 
Then, the reader may check that 


, 2 2fc 

='- 1 * W^r° (28 - 20) 

and 

(««) 

where we have given the constant a superscript to distinguish among the Co’s 
of different j’s. The odd polynomials can be obtained similarly: 


H2j+i{y) 



V (—l) fc 2 2fc+1 j! 2fc+1 
^(2fc+l)!(j-fe)! y ■ 


(28.22) 


The constants are determined by convention. To adhere to this convention, 
we define 

JJ) = (~l) 3 '(2j)! m = (~l) J (2j + l)! 

0 j! ’ 1 j! 

The reader may check that, with these constants, the Hermite polynomials of 
all degrees (even or odd) can be concisely written as 


H n (y) = 


[n/2] , \ r | 

^ (n — 2r)!r! 

r—0 v J 


(2 y) 


n—2r 


(28.23) 


where [a], for any real a, stands for the largest integer less than or equal to a. 


Orthogonality of Hermite Polynomials 

The Hermite polynomials satisfy an orthogonality relation resembling that of 
the Legendre polynomials. We can obtain this relation by multiplying the DE 
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weight function 
for Hermite 
polynomials 


orthogonality of 

Hermite 

polynomials 


for H m (x) by H n (x)e x * , and the DE for H n (x) by H m (x)e x2 and subtract- 
2 

ing. The factor e~ x , the so-called weight function, may appear artificial in 
this derivation, but an in-depth analysis of the classical orthogonal polyno¬ 
mials, of which Hermite and Legendre polynomials are examples, reveals that 
such weight functions are necessary. The reason we did not see such a factor 
in Legendre polynomials was that for them, the weight function is unity. At 
any rate, the result of the above suggested calculation will be 

(H'fHn - 2 xH' m H n - H"H m + 2xH' n H m )e- x2 + (2m - 2n)H m H n e~ x2 = 0. 

The reader may easily verify that the first term is the derivative of 

{H' m H n -H' n H m ) e - x \ 


so that 


d_ - 
dx - 


(H' m H n -H' n H m )e- x + {2m-2n)H m H n e~ x =0 


and if we integrate this over the entire real line, we obtain 


(H' m H n - H' n H m )e~ 


+ (2m — 2n) / H m {x)H n (x)e x dx = 0. 


The first term vanishes because of the exponential factor. It now follows that 
if m yf n, then 

/ OO 

H m {x)H n (x)e~ x2 dx = 0. (28.24) 

-OO 


Generating Function for Hermite Polynomials 

We constructed the generating function for Legendre polynomials in Chapter 
26. Here we want to do the same thing for Hermite polynomials. By definition, 
the generating function must have an expansion of the form 

OO 

g(t,x) = Y a n t n H n (x), 

n —0 

where a n is a constant to be determined. Differentiate both sides with respect 
to x assuming that t is a constant: 

7 OO 

n— 1 

The sum starts at 1 because H' 0 (x) = 0. Use the result of Problem 28.7 to 
obtain 

J OO OO 

a, n t n (2n)H n _\(x) = 2t Y, na n t n ~ 1 H n _i(x). 

n—1 n =1 

8 We have no space to go into the details of the theory of classical orthogonal polynomials, 
but the interested reader can find a unified discussion of them in Hassani, S. Mathematical 
Physics: A Modern Introduction to Its Foundations , Springer-Verlag, 1999, Chapter 7. 
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Now choose the constant a n in such a way that it satisfies the recursion relation 
na n = a n -i. It then follows that 

, oo oo 

= 2t'^2a n - 1 t n ~ 1 H n _ 1 (x) = 2 a m t m H m (x ) = 2 tg. 

n=l m =0 

Thus 

— = 2 tdx => \ng = 2tx + InCCt) => g{t,x) = C{t)e 2tx , 

9 

where the “constant” of integration has been given the possibility of depending 
on the other variable, t. To find this constant of integration, we first determine 
d n ’- 


^ _ Q"n— 1 _ Un— 2 _ _ Un—k _ &0 

n n(n — 1) n(n — 1) • • • (n — k + 1) n! 

Using our results obtained so far, we get 

°° j.n °° -in 

C(t)e 2tx =a 0 J2 —\ H n{ x ) 

n—0 ’ n=0 


where in the last step we absorbed ao (really 1/ao) in the “constant” C(t). 
To determine C(t), evaluate both sides of the equation at x = 0 and use the 
result of Problem 28.8. This yields 


c w = E^(°) = E 


n —0 


k—0 


t 2k (_i)fc(2/c)! 

(2k)\ k\ 


E 


k —0 


{-t 2 ) k 

k\ 


2 


It follows that 


= E ~ H ^ x )- 

n\ 


n—0 


(28.25) 


We now summarize what we have done: 


Box 28.2.1. The nth coefficient of the Maclaurin series expansion of the 
generating function g(t,x) = e 2tx ~ t about t = 0 is H n (x). Specifically, 


H n (x) 


d n 

dir 


^2 tx—t 2 



(28.26) 


generating 
function for 
Hermite 
polynomials 


We can put the result above to immediate good use. Let us square both 
sides of Equation (28.25), multiply by e~ x , and integrate the result from —oo 
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Charles Hermite 
1822-1901 


to +oo. For the LHS, we have 


r+oo 


r+oo 


LHS = / e 2(2te—t ) e -x dx = e -2* / e~ x+Atx dx 


r+oo 


r+oo 


= e~ 2t / e-(*- 2t r +u dx = e 2t e~^~^dx 


J — oo 
r+oo 




°° 2 n+2n 


n—0 


n—0 


where we introduced u = x — 2t for the integration variable, and used the 
result of Example 3.3.1. 

To square the RHS, we need to write it as the product of two infinite sums 
using two different dummy indicesl Therefore, 


r+oo / °° 


RHS = I [ E ) ( E £,+-+)) 


Vn—0 


^m=0 


°° y-m+n p+oo 

= E TTT7 / H m (x)H n (x)e~ x ~ dx 


n mini J „ 

m,n=0 c °° 


=0 unless m = n by (28.24) 
00 j.2n p +00 


71=0 

Comparing the LHS and the RHS, we conclude that 

r+oo 


r+°° 2 

/ [Lf„(a;)] 2 e _a: dec = 2"n!. 

J — OO 


We can combine this result and Equation (28.24) and write 

f+°° 2 

/ H m (x)H n (x)e x dx = ffTr2 n n\5mn, 

J —OO 

where 5 mn is the Kronecker delta which is 1 if m = n and 0 if m ^ n. 


(28.27) 


Charles Hermite, one of the most eminent French mathematicians of the nine¬ 
teenth century, was particularly distinguished for the clean elegance and high artistic 
quality of his work. As a student, he courted disaster by neglecting his routine as¬ 
signed work to study the classic masters of mathematics; and though he nearly 
failed his examinations, he became a first-rate creative mathematician while still in 
his early twenties. In 1870 he was appointed to a professorship at the Sorbonne, 
where he trained a whole generation of well-known French mathematicians, includ¬ 
ing Picard, Borel, and Poincare. 

The character of his mind is suggested by a remark of Poincare: “Talk with 
M. Hermite. He never evokes a concrete image, yet you soon perceive that the 
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most abstract entities are to him like living creatures.” He disliked geometry, but 
was strongly attracted to number theory and analysis, and his favorite subject was 
elliptic functions, where these two fields touch in many remarkable ways. Earlier in 
the century the Norwegian genius Abel had proved that the general equation of the 
fifth degree cannot be solved by functions involving only rational operations and 
root extractions. One of Hermite’s most surprising achievements (in 1858) was to 
show that this equation can be solved by elliptic functions. 

His 1873 proof of the transcendence of e was another high point of his career. 9 
If lie had been willing to dig even deeper into this vein, he could probably have 
disposed of 7r as well, but apparently he had had enough of a good thing. As he 
wrote to a friend, “I shall risk nothing on an attempt to prove the transcendence 
of the number 7r. If others undertake this enterprise, no one will be happier than I 
at their success, but believe me, my dear friend, it will not fail to cost them some 
efforts.” As it turned out, Lindemann’s proof nine years later rested on extending 
Hermite’s method. 

Several of his purely mathematical discoveries had unexpected applications many 
years later to mathematical physics. For example, the Hermitian forms and matrices 
that he invented in connection with certain problems of number theory turned out 
to be crucial for Heisenberg’s 1925 formulation of quantum mechanics, and Hermite 
polynomials are useful in solving Schrodinger ''s wave equation. 


28.2.2 Quantum Particle in a Box 

The behavior of an atomic particle of mass y confined in a rectangular box 
with sides a, 6, and c (an infinite three-dimensional potential well) is gov- quantum particle 
erned by the Schrodinger equation for a free particle, i.e., V = 0. With this in a box 
assumption, the first equation of (28.5) becomes 


v 2 v> 


2 yE 


V’ = o. 


A separation of variables, ip(x,y,z ) = X (x)Y (y) Z (z), yields the ODEs: 


d 2 X 

dx 2 


+ AX = 0, 


d 2 Y 


dy“ 


+ aY = 0, 


d 2 Z 

dz 2 


vX = 0, 


with A + a + v = 2yE/h 2 (see Example 22.2.1). 

These equations, together with the boundary conditions 

tp(Q,y,z) = ip{a,y,z) = 0 => X(0) = 0 = X(a), 

ip(x,0,z) = ip(x,b,z) = 0 => y(0) = 0 = y(6), (28.28) 

ip(x, y, 0) = ip(x, y,c) = 0 => Z( 0) = 0 = Z(c), 

9 Transcendental numbers are those that are not roots of polynomials with integer 
coefficients. 
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lead to the following solutions: 


quantum 

tunneling 


X n {x) 

= sin ( 

rn 7T \ 

. — x ) > 

y a / 

— 

(t) j 

for 

n = 1,2, 

Ym(y) 

= sin ( 

nmr \ 
— y )' 

&m = 

/ TO7T \ 2 

VY) 

for 

m = 1,2 

Zi{z) 

= sin | 

[e)' 

Ul= { 

t) 2 

for 

1 = 1,2,. 


where the multiplicative constants have been suppressed. 

The BCs in Equation (28.28) arise from the demand that the probability 
of finding the particle be continuous and that it be zero outside the box. This 
is not true for a particle inside a finite potential well, in which case the particle 
has a nonzero probability of “tunneling” out of the well. 

The time equation has a solution of the form 


T{t) = e- 


where un mn = — 
2 n 


/n7r\ 2 

( rmr \ 2 

(ln\~ 

— + 1 

hr + 

— 

V a / 

\ b J ' 

K c J _ 


The solution of the Schrodinger equation that is consistent with the bound¬ 
ary conditions is, therefore, 




A lmne- iWl 

l,m,n =1 



The constants Ai mn are determined by the initial shape ip(x,y,z,0) of the 
wave function. In fact, setting t = 0, multiplying by the product of the three 
sine functions in the three variables, and integrating over appropriate intervals 
for each coordinate, we obtain 


Aimn — , 

abc 



dy / dzi/j(x, y, z, 



The energy of the particle is 


E — huilmn 


h 2 ir 2 ( n 2 m 2 l 2 \ 

~2jT \Y + 1Y + Y)’ 


Each set of three positive integers ( n,m,l ) represents a quantum state of 
the particle. For a cube, a = b = c = L, and the energy of the particle is 


E = 


2 yL 2 


( n 2 + m 2 + l 2 ) = 


2yV 2 / 3 


(■ n 2 + m 2 + l 2 ), 


(28.29) 


where V = L 3 is the volume of the box. The ground state is (1,1,1), has 
energy E = 3H 2 tt 2 /2yV 2 ^ 3 , and is nondegenerate (only one state corresponds 
to this energy). However, the higher-level states are degenerate. For instance, 
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the three distinct states (1,1,2), (1,2,1), and (2,1,1) all correspond to the 
same energy, E = 6fi 2 7r 2 /2/W 2 / 3 . The degeneracy increases rapidly with 
larger values of n, to, and l. 

Equation (28.29) can be written as 

2 I 2 I ;2 o2 l r>2 2/J.EV 2 ^ 3 
n + to + l = R where R = —— • 

n z t: z 

This looks like the equation of a sphere in the nml- space. If R is large, the 
number of states contained within the sphere of radius R (the number of states 
with energy less than or equal to E) is simply the volume of the first octant 10 
of the sphere. If N is the number of such states, we have 



7T / 2fjLEV 2 / 3 A 3/2 

6 \ h 2 Tt 2 J 


7T 
6 




Thus the density of states (the number of states per unit volume) is then 


n = 


N 

V 


7t 

6 



3/2 

E 3/2_ 


(28.30) 


This is an important formula in solid-state physics, because the energy E is 
(with minor modifications required by spin) the Fermi energy. If the Fermi 
energy is denoted by Ef, Equation (28.30) gives Ef = an 2 / 3 where a is some 
constant. 


28.2.3 Hydrogen Atom 

When an electron moves around a nucleus containing Z protons, the potential 
energy of the system is V(r ) = —Ze 2 /r. In units in which h and the mass of 
the electron are set equal to unity, the time-independent Schrodinger equation 
of (28.5) gives 

V 2 >F+ ( 2 .E + 

The radial part of this equation is given by the first equation in (22.16) with 
/(r) = 2 E + 2Ze 1 jr. Defining u = rR(r), we may write 



d 2 


d u („ a a \ 

X2 +( A +--^)'“ =0 > 


(28.31) 


where A = 2 E and a = 2Ze 2 . This equation can be further simplified by 
defining r = kz (k is an arbitrary constant to be determined later): 


d 2 u 

dz 2 


ak 

z 


, r, + ( A k 2 -|- — I u — 0. 


density of states 


Fermi energy 


10 This is because n , m, and l are all positive. 
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Choosing A k 2 = — 4 and introducing (3 = a/ (2V—A) yields 


d 2 u 

dz 2 


1 P a 


u = 0. 


(28.32) 


Let us examine the two limiting cases of 2 —> oo and z —> 0. For the first 
case, Equation (28.32) reduces to 

d U i —z!2 

— 7z — ju = 0 =+ u = e ' . 
dz 

For the second case the dominant term will be a/z 2 and the DE becomes 

dru a 

for which we try a solution of the form z m with m to be determined by 
substitution: 


d 2 u 


—y = m(m — l)z m => m(m — 1)< 
dz~ 


- —z m = 0 


= m(m — 1). 


Recalling from Theorem 26.2.1 that a = 1(1 + 1), we determine m to be l + 1. 
Factoring out these two limits, we seek a solution for (28.32) of the form 


i(z) = z 


- 




Substitution of this function in (28.32) gives a new DE: 


f" + 


2(1 + 1 ) 


- 1 


fJJ±± S ^o. 


Multiplying by 2 gives 

zf" + [2(1 + 1) - z\f - (l + 1 -/?)/ = 0 


(28.33) 


which is a confluent hypergeometric DE [see Equation (11.27)]. Therefore, as 
the reader may verify, / is proportional to <i>(^ + 1 — (3,21 + 2; z). Thus, the 
solution of (28.31) can be written as 

u(z) = Cz l+1 e~ z/2 ^(l + 1 - /?, 21 + 2; z). 


Laguerre Polynomials 

An argument similar to that used in the discussion of a quantum harmonic 
oscillator will reveal that the product e~ z / 2 $(l + l— (3, 21 + 2; z) will be infinite 
unless the power series representing <f> terminates (becomes a polynomial). 
This takes place only if (see Box 11.2.2) 


l + l-/3= -N 


( 28 . 34 ) 
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for some integer N > 0. In that case we obtain the Laguerre polynomials 

l n- r(iv + i)r(j + i) ^~ iV ’ ,7 ' + 1; ^ where J = 2/ + 1 ’ ( 28 - 35 ) 


where the factor in front of < t> is a standardization factor. 

Condition (28.34) is the quantization rule for the energy levels of a hydrogen¬ 
like atom. Writing everything in terms of the original parameters, and re¬ 
defining (3 as (3 = N + l + 1 = n to reflect its integer character, yields—after 
restoring all the /i’s and the ft’s—the energy levels of a hydrogen-like atom: 


Z 2 ^e 
2 ft 2 n 2 



a 


where a = e 2 /he = 1/137 is the fine structure constant. 
The radial wave functions can now be written as 


R ni (r ) = = Cr l e- Zr/nao <f> (~n + l + 1,2 1 + 2; —) 

r \ na 0 ) 


where oo = ft 2 /me 2 = 0.529 x 10 -8 cm is the Bohr radius. 

The explicit form of Laguerre polynomials can be obtained by substitut¬ 
ing the truncated confluent hypergeometric series [see Equation (11.28)] in 
(28.35): 


, j / n _ r(fv + j +1) r(j + 1 ) y- t(—n + k ) k 

N[ j ~ r (N + i)r(j +1) r(-iv) ^ r(j + 1 + k)r(k + 1 ) 


We now use the result of Problem 11.4 and write 


T{k-N) 

n-N) 


(—l) k N(N — !)••• (TV — A; + 1) 


(—l) fc A7! 
(N — k)\' 


It follows that 


l n( x ) 


T(N + j + l) A (—l) fc 7V! 1 , 

T{N+1) f^ o (N-k)\r{j + l + k)T{k + l) ' 


Simplifying and writing all gamma functions in terms of factorials, we obtain 
the final form of the Laguerre polynomials: 


N 


l n( x ) = X) 

k—0 


(N + j)\(-l) k k 
(N — k)\k\(k + j)\ ' 


(28.36) 


The generating function of the Laguerre polynomials can be calculated 
using a procedure similar to the one used in the case of Hermite polynomials. 
We first write 

OO 

= Y^ a nt n L 3 n (x), 

71=0 


Laguerre 

polynomials 


Truncation of 
infinite series gives 
the quantization 
rule for the energy 
levels of the 
hydrogen atom. 


fine structure 
constant 
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differentiate it with respect to x, and use the result of Problem 28.10 to obtain 


, CXJ cx_> 

-fa =-'52 a n t n L 3 +_\{x) =-t52 ant^L^ix) = -tg j+ 1 , (28.37) 

n —0 n —0 

where we have taken a n = a n -\ as a natural choice whereby the last sum 
could be written in closed form. To find a solution of (28.37), we look at 
g(t, 0). The recursion relation a n = a n _i implies that all a n are equal, and 
we set all of them equal to 1. Then 

aAt, o) = Y, ^(°) = E ~^r tn = (! - 

where we used the fact that the only contribution to L 3 n { 0) comes from the con¬ 
stant term in the polynomial [corresponding to k = 0 in (28.36)]. Furthermore, 
the last sum is the binomial series (10.15) with x —> — t and a —> (—j — 1). 
This suggests defining a new function g(t , x) via 

= (1 -t) -?- 1 s(t,ar). 

Substitution of this in (28.37) gives 

=-t(l-t)- j ~ 2 g => ^ = —dx =► g = C(t)e~ tx/{1 - t) 
ax g 1 — t 

and 

ft (*, *) = (!- t)- i - 1 C(f)e- te /< 1 - t ) = 

With the value of 0) given, we determine C(t) to be one. 

Box 28.2.2. The nth coefficient of the Maclaurin series expansion of the 
generating function gj(t, x) = (1 — f ) - - 7-1 e~ tx about t = 0 is L 3 n [x). 
Specifically, 

1 d n 

L^.(a;) = — — -- ---7 . (28.38) 

ny ’ n\ dt n (1 - 1) 3+1 t=0 v ' 


28.3 The Wave Equation 


In the preceding sections the time variation has been given by a first derivative. 
Thus, as far as time is concerned, we have a FODE. It follows that the initial 
specification of the physical quantity of interest (temperature T or Schrodinger 
wave function ip) is sufficient to determine the solution uniquely. 

The wave equation 



1 d 2 ip 
c 2 dt 2 


(28.39) 
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contains time derivatives of the second order, and, therefore, requires two ar¬ 
bitrary parameters in its general solution. To determine these, we expect two 
initial conditions. For example, if the wave is standing, as in a rope clamped at 
both ends, the initial shape of the rope is not sufficient to determine the wave 
function uniquely. One also needs to specify the initial (transverse) velocity 
of each point of the rope, i.e., the velocity profile on the rope. 


Example 28.3.1. one-dimensional wave 

The simplest kind of wave equation is that in one dimension, for example, a wave 
propagating on a rope. Such a wave equation can be written as 

d 2 tp _ 1 d 2 %p 

a? ~~ c* ~dtF’ 

where c is the speed of wave propagation. For a rope, this speed is related to the 
tension r and the linear mass density p by c = \/t / p. 

Let us assume that the rope has length a and is fastened at both ends (located 
at x = 0 and x = a). This means that the “displacement” ip is zero at x = 0 and at 
x = a. 

A separation of variables, ip(x,t) = X(x)T(t), leads to two ODEs: 

^-4- + AX = 0, + c\T = 0. (28.40) 

dx dt 


The first equation and the spatial boundary conditions give rise to the solutions 


/ rm \ 

, / rm\ 2 

for n 

( — x ) ’ 

A n = (-) 

\ a ) 

\ a ) 



The second equation in (28.40) has a general solution of the form 
T(t) = An COSUJnt + B„ SmUJnt, 

where u)„ = arm/a and A n and B n are arbitrary constants. The general solution is 
thus 

OO 

^ / YJj'TCOC \ 

ip(x,t) = {A n cos oj n t + B n sin u>nt) sin I —— ) . (28.41) 

n= 1 ^ 

Specification of the initial shape of the rope as ^(#,0) = f(x) gives a Fourier 
series, 


CXJ 

/(*) = J2 An sin (~) 


from which we can determine A n \ 


A n = -f f(x) sin d x . 

a J 0 \ a ) 


What about B„ ? Physically, the shape of the wave is not enough to define the 
problem uniquely. It is possible that the rope, while having the required initial 
shape, may be in an unspecified motion of some sort. Thus, we must also know the 
“velocity profile,” which means specifying the function dip/dt at t = 0. If it is given 


one-dimensional 

wave 
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longitudinal and 
transverse parts of 
guided waves. 


that di/>/dt\t=o = g{x), then differentiating (28.41) with respect to t and evaluating 
both sides at t = 0 yields 


OO 

g(x) = ^2 oj n B n sin 

n= 1 


and B n is also determined: 

B n = —— [ g(x) sin 

auJn. Jo v a ) 

The frequency c o n is referred to as that of the nth mode of oscillation. Thus 
a general solution is a linear superposition of infinitely many modes. In practice, 
it is possible to “excite” one mode or, with appropriate initial conditions, a finite 
number of modes. m 


28.3.1 Guided Waves 

Waveguides are hollow tubes (or tubes filled with some dielectric material) in 
which electromagnetic waves can propagate along an axis which we take to 
be the z-axis of either Cartesian or cylindrical coordinates. We assume that 
the dependence of the electric and magnetic fields on 2 and t is of the form 
e i(uit-kz) w } iere oj anc i are constants to be determined. We therefore write 

E = E 0 {x,y)e i{ut ~ kz \ B = B 0 (x,y)e i ^ ut ~ kz \ (28.42) 

for Cartesian coordinates. If cylindrical geometry is appropriate, then Eo and 
Bo will be functions of p and ip. Note that, in general, Eo and Bo have three 
components. 

The electric and magnetic fields of (28.42) ought to satisfy the four Maxwell’s 
equations. Let us assume that the waveguide is free of any charges or cur¬ 
rents, so that Maxwell’s equations for empty space are appropriate. Because 
of the nature of the dependence on z, it is useful to separate the longitudi¬ 
nal geometry—the geometry along 2 —from the transverse geometry—the 
geometry perpendicular to z. So, we write 


E = E t + e z E z = (Eot + e,£* te )e i( “ t - fcz) , 
B = B t + e z B z = (B ot + e z B 0z )^~ kx \ 


_ „ d ,9 ,9 „9 

v = e x — —I- e y — — he 2 — = V t + e z —, 

ox ay oz oz 


(28.43) 


where the subscript t stands for transverse. With these assumptions, Maxwell’s 
first equation becomes 


0 = V • E = V 


' dz 


(E 0t +e z E 0z )e i ^ t ~ kz ^ 


= (V t • E ot ) e i(uJ ‘- fez) + ( -ikE 0z ) 
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because Vt • (e z Eo z ) = 0 and e 2 • E 0 t = 0. It follows that 

V t • Eot = ikEoz. 

An analogous calculation gives a similar result for Maxwell’s second equation. 
Putting these two equations together, we get 

Vt • Eot = ikEoz , Vt • Bot = ikBo z ■ (28.44) 

The LHS of Maxwell’s third equation gives 

4-] x [E 0 e i( “*- fcz) 
oz J L J 

= V t x [E 0 e i(wt - fc2) j + e 2 x [-*fcE 0 e i(u; ‘- fc2) 

= e^-* 2 ) (V t x E 0 - ifce 2 x E ot ). 

The RHS of the third equation is 

= -iw Bpe i{wt ~ kz) . 
dt 

Equating the two sides yields 

-ituBo = V t x E 0 - ike z x E 0 t- (28.45) 

The first term on the RHS can be written as 

G x &y &z 

d d n 

~5x ~chj ^ 

Eq X Bq y Eq Z 

= v x (E 0z e z ) + e 

The reader may easily check that the second line follows from the first. Equat¬ 
ing the transverse parts of the two sides of Equation (28.45), we get 

-*wB ot = V x (E 0z e z ) - ike z x E ot . (28.46) 

A similar calculation turns the fourth Maxwell equation into 

E 0 f = V x (B 0z e z ) - ike z x B ot . (28.47) 

We now want to express the transverse components in terms of the lon¬ 
gitudinal components. Multiply both sides of (28.47) by — itu and substitute 
for — iujBot from (28.46): 
u; 2 

—E ot = -«wV x (B 0z e z ) - ike z x [V x (E 0z e z ) - ike z x E ot ] 

& 

= -*wV x (B 0z e z ) - ike z x [V x {E 0z e z )] - k 2 e z x (e 2 x E ot ). 


dE 0z . dE 0z . ,, (dE 0y 8E 0x \ 
~ 8y &x dx e ^ +e2 \ v dx 8y J 

'-V-' 

This is transverse. 

/ dEpy dEp x \ 

\ dx dy) 


V t x E 0 = 
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transverse 
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Using the bac cab rule, the last term gives 


e z x (e z x E ot ) = e z (e z ■ E ot ) -E ot (e, • e z ) = -E ot . 

=0 

It now follows that 

(jJ ~ k ^j Eot = -iu'V x {B 0z e z ) - ike z x [V x {E 0z e z )} . (28.48) 

The first term on the RHS can be simplified by using the second equation in 
(14.11): 


V x (B 0z e z ) = B te Vxe^+(V5 0 J x e z = -e z x (V t B 0z ) 
=o 


because e~ is a constant vector (both magnitude and direction), and Bq z is 
independent of z. The second term on the RHS of (28.48) can be simplified 
as follows: 


e, x [V x {E 0z e z )\ = e z x 
dE 0z 


9Eq z „ 3Eq z „ 

G r — ——-G 


dy 


d E, 


0z , 


dy dx 

Substituting these results in Equation (28.48) yields 


dx 
= V t.Eo z - 


7 2 E 0t = i [~kV t E 0z + ixe z x {V t B 0 z)] where y 2 = - k 2 . 

A similar calculation gives an analogous result for the magnetic field. We 
assemble these two equations in 


7 2 Eoi — i [—fcVtEoz + ixe z x (Vt-Boz)] > (28.49) 

u> 2 

j 2 B m = i [~kV t B 0z - ixe z x {V t E 0z )}, y 2 = — - /c 2 . 

c z 

Although we derived (28.44) and (28.49) using Cartesian coordinates, the 
fact that the final result is written explicitly in terms of transverse and lon¬ 
gitudinal parts—without reference to any coordinate system—implies that 
these equations are valid in cylindrical coordinates as well. 

Three types of guided waves are usually studied. 

1. Transverse magnetic (TM) waves have B z = 0 everywhere. The bound¬ 
ary condition on E demands that E z vanish at the conducting walls of 
the guide. 


transverse electric 
(TE) waves 


2. Transverse electric (TE) waves have E z = 0 everywhere. The boundary 
condition on B requires that the normal directional derivative 


dB z 

dn 


= e n ■ (VB Z ) 


vanish at the walls. 
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3. Transverse electromagnetic (TEM) waves have B z = 0 = E z . For a 
nontrivial solution, Equation (28.49) demands that y 2 = 0. This form 
resembles a free wave with no boundaries. 

In the following, we consider some examples of the TM mode (see any 
book on electromagnetic theory for further details). The basic equations in 
this mode are 


Bq z = 0, 7 2 E ot = -ikVtEo z , 7 2 B ot = -iue z x (V t E 0z ). 


Taking the dot product of Vt with the middle equation and using the first 
equation in (28.44) yields 


V 2 Eoz + 1 2 Eq z — 0. 


(28.50) 


This is the basic equation for TM waves propagating in a waveguide. 
Example 28.3.2. rectangular wave guides 

For a wave guide with a rectangular cross section of sides a and b in the x and the 
y direction, respectively, (28.50) gives 

d 2 E 0z , d 2 E 0z , 2rn 
TT + “ST- + 7 E 0z = 0. 
ox oy 

A separation of variables, Eo z (x,y) = X(x)Y(y), leads to 


dx* 
d 2 Y 


+ AX = 0, X(0) = 0 = X(o), 


—^+y.Y = 0, y(0) = 0 = Y(b), 

dy 

where y 2 = A + y. These equations have the solutions 
X n {x) = sin (~ a: ) i A„=(^) 2 
Y m {y) = sin = (y) for m = 1 , 2 ,.... 


The wave number is given by k — \J ( u>/c) 2 — y 2 , or, introducing indexes for k, 


k 


mn — 



which has to be real if the wave is to propagate [an imaginary k leads to exponential 
decay or growth along the z-axis because of the exponential factor in (28.49)]. Thus, 
there is a cut-off frequency, 


^ran 



for m,n> 1, 


below which the wave cannot propagate through the wave guide. It follows that, 
for a TM wave, the lowest frequency that can propagate along a rectangular wave 
guide is wu = nc^/a 2 + b 2 /ab. 


transverse 
electromagnetic 
(TEM) waves 


Basic equation for 
TM waves. 


rectangular wave 
guides 




686 


Other PDEs of Mathematical Physics 


cylindrical wave 
guide 


The most general solution for E z is, therefore, 

OO 

77 A ■ f nn \ ■ f mn \ i(ut±k mn z) 

E z = 2_, A ™n Sill y—X) Sin e • 

m,n= 1 

The constants A mn are arbitrary and can be determined from the initial shape of 
the wave, but that is not commonly done. Once E z is found, the other components 
can be calculated using Equation (28.50). ■ 


Example 28.3.3. cylindrical wave guide 

For a TM wave propagating along the z-axis in a hollow circular conductor, we have 
[see Equation (28.50)] 


ld_ 

pdp 

s 



+ 


1 d 2 E 0z 
P 2 dp 2 


+ -y 2 E 0z = 0. 


= ^t E 0z 


The 

DE 


The 


separation Eq z = R(p)S(p) yields S(ip) = Acosrrup + B sin rrup and the Bessel 


d 2 R 1 dR ( 2 m 2 \ 

dp 2 + p dp + y p 2 ) 

solution to this equation, which is regular at p 


R = 0. 

= 0 and vanishes at p = a, is 


Recalling the definition of 7 , we obtain 
2 2 

^ 2 _ 2 _ J- mn 

c 2 ’ 7 a 2 


and 

7 = 

%mn 

a 

k = \j 

UJ 2 

rr* 2 

*^mn 

c 2 

o 2 ' 

Ja. 



uric case 

(m = 

0) is 


OO 

E,{p,<p,t) = Y / A n J 0 (^p) e ^ t±k ^ 

n= 1 


and B z = 0, 


where k n = -Jlo 2 /c 2 — x\ n jd 


28.3.2 Vibrating Membrane 

Waves on a circular drumhead are historically important because their inves¬ 
tigation was one of the first instances in which Bessel functions appeared. The 
following example considers such waves. 

For a circular membrane over a cylinder, the wave equation (28.39) in 
cylindrical coordinates becomes 11 

13/ dip\ 1 d 2 ip 1 d 2 ip 

p dp v dp ) + p 1 dtp 2 c 2 dt 2 

11 Assuming that the membrane is perpendicular to the z-axis, the wave amplitude will 
depend only on p and ip. 
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which, after separation of variables, reduces to 


S(ip) = A cos rrup + Bsinnup for to = 0,1,2,..., 
T{t) = C cosuit + D sin uit, 
d 2 R IdR - 2 

dp 2 p dp 


R = 0. 


The solution of the last (Bessel) equation, which is defined for p — 0 and 
vanishes at p = a, is 


R(p) = EJ m ^ mn pj where 


k' *£mn 

c a 


and n= 1 , 2 ,.... 


This shows that only the frequencies u> mn = ( c/a)x mn are excited. 

If we assume an initial shape for the membrane, given by f{p , ip), and an 
initial velocity of zero, then D = 0, and the general solution will be 

OO OO 

t/j(p, <p,t)='^2'^2j m p \ cos cosmip + B mn sin mip ), 

where 


m— 0 n —1 


p2ir 


A 

m.n. — 


Brnn — 


7 Ta 2 J m _|_i(x mn ) Jq 


dp [ dppf(p, ip) Jm (~^p) cos mip, 


— 2 2 - -f dpi dp pf (p, ip)J m (——— p\ suitor, 

7ra ' J m+l( x mn) Jo Jo \ CL J 


and the orthogonality of Bessel functions (27.24) has been used. In particular, 
if the initial displacement of the membrane is independent of ip, then only the 
term with to = 0 contributes, and we get 


OO 

${p, t) = A n J 0 (j^p) cos 


where 


A n — 


a 2 Jl(x 0n ) Jo 


dppf(p)Jo (-^-p) ■ 


Note that the wave does not develop any (^-dependence at later times. 


28.4 Problems 

28.1. Suppose A = 0 in Equation (28.3). Show that X(x) = 0. 

28.2. The two ends of a thin heat-conducting bar are held at T = 0. Initially, 
the first half of the bar is held at T = Tq, and the second half is held at 
T = 0. The lateral surface of the bar is then thermally insulated. Find the 
temperature distribution for all time. 
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28.3. The two ends of a thin heat-conducting bar of length b are held at 
T = 0. The bar is along the x-axis with one end at x = 0 and the other 
at x = b. The lateral surface of the bar is thermally insulated. Find the 
temperature distribution at all times if initially it is given by: 


(a) T(0, x) 

(b) T(0,x) 


Tq for the middle third of the bar, 
0 for the other two-thirds. 


0 


37T 


T 0 sin —x - 7 r if 77 < x < —. 


„ b 2b 

if 0 < x < - or — < x < 0 , 

_ _ 3 3 - - ’ 

b 2b 


x 1 
b~ 2 


(c) T(0,x) = T o 

(d) T(0, x) = T 0 sin ^x) . 


To 

2 ' 


28.4. Derive Equation (28.7) from the equation before it by changing variables. 

28.5. Using the Stirling approximation (11.6), write all four factorials of 
Equation (28.14) as exponentials. Then simplify to arrive at Equation (28.15). 

28.6. Derive Equations (28.20)-(28.23). 

28.7. By differentiating both sides of Equation (28.23) with respect to y , and 
(slightly) manipulating the resulting sum, show that H' n (y) = 2nH n _i{y). 

28.8. Evaluate Equation (28.23) at y = 0 and note that only the last term 
survives. Now show that 


H n { 0 ) 


(~l) fc (2fc)! 

k\ 


if n is odd, 
if n = 2k. 


28.9. Use the substitution u(z) = z l+1 e z / 2 /(z) in (28.32) to derive Equation 
(28.33). 

28.10. Differentiate both sides of Equation (28.36) with respect to x and 
show that 

^ L n ( x ) = ~ l 3 n- i(*)- 

28.11. The two ends of a rope of length a are fixed. The midpoint of the 
rope is raised a distance a/2, measured perpendicular to the tense rope, and 
released from rest. What is the subsequent wave function? 

28.12. A string of length a fastened at both ends has an initial velocity of 
zero and is given an initial displacement as shown in Figure 28.1. Find ip(x, t) 
in each case. 
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Figure 28.1: The initial shape of the waves. 


28.13. Repeat Problem 28.12 assuming that the initial displacement is zero 
and the initial velocity distribution is given by each figure. 

28.14. Repeat Problem 28.13 assuming that the initial velocity distribution 
is given by: 


(a) g(x) 

(b) g(x) 


v 0 if 0 < x < 


0 


a 
2 ’ 

it — < x < a. 

2 


. 2nx a 

Vo sin- it 0 < x < 

a a 2 

0 if - < x < a. 


28.15. A wave guide consists of two coaxial cylinders of radii a and b (b > a). 
Find the electric field for a TM mode propagating along the two cylinders in 
the region between them. Hint: Both linearly independent solutions of the 
Bessel DE are needed for the radial function. 




Part VII 
Special Topics 



Chapter 29 

Integral Transforms 


Chapters 26 and 27 illustrated the Frobenius method of solving differential 
equations using power series, which gives a solution that converges within an 
interval of the real line. This chapter introduces another method of solving 
DEs, which uses integral transforms. The integral transform of a function 
v is another function u given by 


u(x) = f K(x,t)v(t) dt, (29.1) 


where (a, 6) is a convenient interval, and K(x,t), called the kernel of the 
integral transform, is an appropriate function of two variables. 

The idea behind using integral transform is to write the solution u(x) 
of a DE in x in terms of an integral such as Equation (29.1) and choose v, 
the kernel, and the interval (a, b) in such a way as to render the DE more 
manageable. There are many kernels appropriate for specific DEs. However, 
two kernels are most widely used in physics, which lead to two important 
integral transforms, the Fourier transform and the Laplace transform. 


kernel of integral 
transforms 


Strategy for 
solving DEs using 
integral transforms 


29.1 The Fourier Transform 

Fourier transform has a kernel of the form K(x, t) = e ltx and an interval 
(—oo,+oo). Let us see how this comes about. 

The Fourier series representation of a function F(x) is valid for the entire 
real line as long as F{x) is periodic. However, most functions encountered 
in physical applications are defined in some interval ( a , b) without repetition 
beyond that interval. It would be useful if we could also expand such functions 
in some form of Fourier “series.” 

One way to do this is to start with the periodic series and then let the 
period go to infinity while extending the domain of the definition of the func¬ 
tion. As a specific case, suppose we are interested in representing a function 
f(x) that is defined only for the interval (a, b) and is assigned the value zero 
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Figure 29.1: (a) The function we want to represent, (b) The Fourier series represen¬ 
tation of the function. 


everywhere else [see Figure 29.1(a)]. To begin with, we might try the Fourier 
series representation, but this will produce a repetition of our function. This 
situation is depicted in Figure 29.1(b). 

Next we may try a function f\(x) defined in the interval (a —A/2,6+A/2), 
where A is an arbitrary positive number: 

{ 0 if a — A/2 < x < a, 
f(x) if a < x <b, 

0 if b < x < b + A/2. 

This function, which is depicted in Figure 29.2, has the Fourier series repre¬ 
sentation [see Equation (18.23)] 

. OO 

f K {x) = -j== £ / A ,„e 2 ““/' i+A ) (29.2) 


where 

1 rb -\- A/2 

/a, re = 7 = = / e - 2 i ™ x H L + Vf A ( x ) dx . (29.3) 

V A + A Ja — A/2 

We have managed to separate various copies of the original periodic func¬ 
tion by A. It should be clear that if A —> oo, we can completely isolate the 
function and stop the repetition. Let us investigate the behavior of Equations 
(29.2) and (29.3) as A grows without bound. First, we notice that the quan¬ 
tity k n defined by k n = 2mr/(L + A) and appearing in the exponent becomes 
almost continuous. In other words, as n changes by one unit, k n changes only 
slightly. This suggests that the terms in the sum in Equation (29.2) can be 
lumped together in j intervals of width Any, giving 
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w 


_1_ 






a-AI2 b+A/2 


Figure 29.2: By introducing the parameter A, we have managed to separate the copies 
of the function. 


where kj = 2njir/(L + A), and f A {kj) = /a, w ,-. Substituting A rij = [{L + 
A)/27r]Afcy in the above sum, we obtain 

/a(z) 




x/T+A 


27T 


1 oo 

7s ^ h(ki) ‘ 

v n = — <"Vi 


e ik ' x \k h 


where we introduced f\(kj) defined by /a(%) = -y/(T + A)/2n It is 

now clear that the preceding sum approaches an integral in the limit that 
A —> oo. In the same limit, f A {x) —> f(x), and we have 


where 


/O) 



f(k)e xkx dk , 


f(k) = lim f\{kj) = lim 

A—>oo A—>-oo 


L + A 


= lim 


L +A 1 


2-7T 
r b+ A/2 


/A(fci) 


A—^oo V 27T L A J a — A/2 


e~^ x f A (x)dx, 


(29.4) 


Fourier and 
inverse Fourier 
transforms 


or 


1 F 00 

/(*) = -= / f(x)e~ ikx dx. 

V ^’7T J — oo 


(29.5) 


The function / in (29.4) is called the Fourier transform of / and / in (29.5) 
is called the inverse Fourier transform of /. Note that the difference 
between the two transforms is the sign of the exponential in the integrand. 

Another notation that is commonly used for Fourier transform of a func¬ 
tion / is fF[/]. The inverse Fourier transform of a function g is then denoted 
by T -1 [g]. This means that T[/] is a function whose value at x is given by 


1 f'°° 

m( x) = — 7 = / f(k)e ikx dk, (29.6) 

V Z 7 T J — oo 

Similarly, 1 * * * V \g\ is a function whose value at k is given by 

1 F°° 

T- 1 [g](fc) = -= / g(x)e- ikx dx, (29.7) 

V Z7T J — oo 
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Note that the use of k and x in these two equations is completely arbitrary. 
The only requirement is that the function and the variable in its argument 
on the left appear, respectively, in the integrand and in the exponent on the 
right. For example, (29.6) could be written as 


1 r°° 

5I/P) = —n?= / f(x)e lkx dx , or T[/i](t) 

V 2j7T J — oo 

and (29.7) as 

1 f 00 

3^ 1 \ g ](x) = -= / g(k)e~ ikx dk or Tp/P) 

V 27T J — oo 


V2n J - 


/ oo 

h{u>)e iwt duj, 

-OO 


a/^t y_ c 


f(x)e~ ixy dx. 


29.1.1 Properties of Fourier Transform 


Equations (29.4) and (29.5) are reciprocals of one another. However, it is not 
obvious that they are consistent. In other words, if we substitute (29.4) in 
the RHS of (29.5), do we get an identity? Let’s try this: 


m 


i 



it 00 - , 

-= / f(k')e ik x dk' 

v "7T J — oo 

'OO />oo 

dx / f(k')e i{k '- k)x dk'. 

-oo J — oo 



We now change the order of the two integrations: 


m 


dk'f(k o 


-- / dxe i(k '~ k)x 
27T 


But the expression in the square brackets is the Dirac delta function given by 
Equation (18.28). Thus, we have /(fc) = f^dk'f(k')S(k' — £;), which is an 
identity. In the T notation, this result can be written as 


T-P)/] =TT" 1 [/] = /, (29.8) 


for any function f. The second identity can be shown similarly. Another 
property enjoyed by the Fourier transform and its inverse in linearity. If a 
and b are constants and / and g functions, then 

J[af + bg] = a9[f] + bJ[g], and T _1 [a/ + bg\ = aT -1 [/] + 6T -1 )#]. 

(29.9) 

It is useful to generalize Fourier transform equations to more than one 
dimension. The generalization is straightforward: 

3 r_1 [/](k) = />) = d n xf(r)e-^. (29.10) 

where n is usually 2 or 3, $2^ is the entire /c-space, and is the entire 
x- space. 
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29.1.2 Sine and Cosine Transforms 


The complex exponential in the definition of Fourier transform or its inverse 
can be broken down into its trigonometric parts. Then for an even function, 
the cosine part contributes and for an odd function, the sine part contributes. 
In either case, the integration can be equated to 2 J 0 °°. This leads to 
the sine transform and cosine transform denoted by T s [/] and r S c [/], 
respectively, for any function: 


?»[/](*) 

[/](*) 



f(k) sin kxdk , 



f(k) cos kxdk. 


(29.11) 


Sine and cosine 
transforms 


What is the inverse of a cosine transform? To find out, let F(x) denote 
the left-hand side of the second equation in (29.11). Multiply both sides of 
the equation by cos k'x —with k' > 0—and integrate over all positive values 
of x to get 


f-OO I 2 /*oo pOO 

/ F(x) cos k'xdx = \ — / f(k)dk / cos kxcos k'xdx. 

Jo V t t J 0 ' J 0 


(29.12) 


Writing the cosines in terms of exponential, the x integration on the right 
gives 


J cos kx cos k'xdx = \J ( e ikx + e~ ikx ) (e ik ' x + e ~ ik ' x ) 


dx 


1 

4, 

1 

4 


gix(k-\-k') _j_ g— ix(k-\-k') _j_ ^ix(k-k') _j_ ^-ix(k-k') 


dx 


e ix(k+k') d x + 

4 


s ix{k -k') d a 


= - [S(k + k') + S(k - k')]. 


To go from the second to third line, we used J 0 °° e~ lax dx = f^ r>c e mx dx, which 
the reader can easily verify; and to go from the third to the last line, we used 
Equation (18.28). Substituting the last result in (29.12), we obtain 


Inverses of sine 
and cosine 
transforms 


J F(x) cos k'xdx = J f(k)S(k+k')dk+^J ^ J f(k)S(k — k')dk 


=0 (Reader, why?) 


f(k'), 


or 


f(k') = ^ J F(x) cos k'xdx. 
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This shows that the inverse of a cosine transform is another cosine trans¬ 
form. Similarly, one can show that the inverse of a sine transform is another 
sine transform. We shall not use sine or cosine transforms, as the Fourier 
transform, with the exponential in the integrand, is much more convenient. 


29.1.3 Examples of Fourier Transform 


Example 29.1.1. 

defined by 


Let us evaluate the inverse Fourier transform of the function 


f(x) 


b if |*| < a, 
0 if |*| > a 


(see Figure 29.3). From (29.5) and (29.7) we have 


1 ,oo l ra 

3" 1 [/](*) = f(k) = -±= / f(x)e~ ik *dx = -%= / e~ ik *dx 

V "7T J — oo V J —a 


2 ab /sin ka\ 

V ka ) ’ 


which is the function encountered on page 491 and depicted in Figure 18.7. 

This result deserves some detailed discussion. First, note that if a —► oo, the 
function /(*) becomes a constant function over the entire real line, and we get 


m 



sin ka 
k 



by Equation (18.27). This is the Fourier transform of an everywhere-constant func¬ 
tion (see Problem 29.1). Next, let b — ♦ oo and a — * 0 in such a way that 2 ab, which 
is the area under /(*), is 1. Then /(*) will approach the delta function, and f(k) 
becomes 

~ 2 ab sin ka 1 sin ka 1 

}(k) = Inn — —- = — Inn — - - = - 

b a ^Q \/2 tv ka \f2n a —*o ka 

So the Fourier transform of the delta function is the constant l/v^r as implied by 
(29.5). 

Finally, we note that the width of /(*) is A* = 2a, and the width of f(k) is 
roughly the distance, on the fc-axis, between its first two roots, fc+ and fc_, on either 
side of k = 0: A k = k+ — k- = 2ir/a. Thus increasing the width of /(*) results 
in a decrease in the width of f(k). In other words, when the function is wide, its 
Fourier transform is narrow. In the limit of infinite width (a constant function), we 
get infinite sharpness (the delta function). The last two statements are very general. 
In fact, it can be shown that A*A k > 1 for any function /(*). When both sides 



Figure 29.3: The square "bump" function. 
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of this inequality are multiplied by the (reduced) Planck constant h 
result is the celebrated Heisenberg uncertainty relation : 1 

AxAp > h, 

where p = hk is the momentum of the particle. 

Having obtained the transform of f(x), we can write 


h/(2n), the 

Heisenberg 

uncertainty 

relation 


/ 0*0 


1 f 2 b sin ka n~ x __ b f sin ka %kx ji 

r / r — 7 —e dk= — — 7 —e dk. 

V2tt J_oo V2tt A; tt J-oo k 


Figure 29.4 shows the integral 


b 

7T 



sinfca 

-e dfc 

k 


when K = 10, Iv = 20, and If = 100. It is seen that by making the limits 
of integration larger and larger, the graph approximates Figure 29.3 better and 
better. B 


Example 29.1.2. Let us evaluate the Fourier transform of a Gaussian g{x) = 
ae~ bx with a, b > 0: 

„ r oo „ „ — A; 2 * * /4b /■oo „ 

g(k) = -4= / e- 6 '" +ikx/b) dx = _ / e ~^+ik/ 2 b) dx 

V^J-vo J-o o 

To evaluate this integral rigorously, we would have to use the calculus of residues 
developed in Chapter 21. However, we can ignore the fact that the exponent is 
complex, substitute y = x + ik/(2b), and write 

r e -M*+W)l 2 dx= r e -»y 2 dy= [*_. 

J — oo J — oo V ^ 



Figure 29.4: The thinnest plot represents K = 10; the next thinnest plot represents 
K = 20; and the thickest plot represents K = 100. 

1 In the context of the uncertainty relation, the width of the function—the so-called 

wave packet—measures the uncertainty in the position x of a quantum mechanical particle. 

Similarly, the width of the Fourier transform measures the uncertainty in k , which is related 

to momentum p via p = hk. 
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Yukawa potential 


Thus, we have q(k) = °— e k A 4b ) which is also a Gaussian. 

W V2b 

We note again that the width of g(x), which is proportional to 1 /Vb, is in inverse 
relation to the width of g{k), which is proportional to Vb. We thus have AxA 
k ~ 1. ■ 


Example 29.1.3. In this example we evaluate the inverse Fourier transform of the 
Coulomb potential V(r ) of a point charge q at the origin: V(r) = k e q/r. The inverse 
Fourier transform is important in scattering experiments with atoms, molecules, and 
solids. As we shall see in the following, the inverse Fourier transform of V (r) is not 
defined. However, if we work with the Yukawa potential, 


14 (r) = 



a > 0 , 


the inverse Fourier transform will be well-defined, and we can take the limit a —> 0 
to recover the Coulomb potential. Thus, we seek the inverse Fourier transform of 

V a (r). 

We are working in three dimensions and therefore may write 


T _1 [14](k) = V a (k) = 1 


(2tt) 3 /2 


,3 —ikr k e qe 

a xe - 


It is clear from the presence of r that spherical coordinates are appropriate. We 
are free to pick any direction as the z-axis. A simplifying choice in this case is the 
direction of k. So, we let k = |k|e z = ke z , or k ■ r = kr cos 0 , where 8 is the polar 
angle in spherical coordinates. Now we have 


V C ‘W = Trm r r2rfr C s ™8d0 r dpe 

(A 71 ") ' J 0 J q J q 


— ikr cos 6 & 


The p integration is trivial and gives 2 n. The 8 integration simplifies if we make 
the substitution u = cos 8: 


[• 7T 

/ Si 

Jo 


• r\ — ikr cos 9 i/-> 

sin 6e ad = 


/: 


— ikru i 1 / ikr — ikr\ 

e au = — — (e — e ) 

ikr 


We thus have 


14 (k) = 


fce g(27r) 
(2 tt)3/ 2 
k e q 1 


(27T) 1 / 2 ik 
k e q 1 


r 

hrv 


r ikr 


1 / ikr —ikr\ 

(e -e ) 


^( — oc-\-ik)r ^ —( a-\-ik)r 


(27 t) 1 / 2 ik \ — a + ik 


0 ( — ot.+ik)r 


0 — (ot.+ik)r 


a + ik 


Note how the factor e~ ar has tamed the divergent behavior of the exponential at 
r —> oo. This was the reason for introducing it in the first place. Simplifying the last 
expression yields 14(k) = (2k e q/V2n){k 2 +a 2 )~ 1 . The parameter a is a measure of 
the range of the potential. It is clear that the larger a is, the smaller the range. In 
fact, it was in response to the short range of nuclear forces that Yukawa introduced 
a. For electromagnetism, where the range is infinite, a becomes zero and 14 (r) 
reduces to V(r). Thus, the inverse Fourier transform of the Coulomb potential is 


Vboui(k) = 


2 k e q 1 
V2tv k 2 ' 




29.1 The Fourier Transform 


701 


If a charge distribution is involved, the inverse Fourier transform will be interestingly 
different as the following example shows. g 


Example 29.1.4. The example above deals with the electrostatic potential of a 
point charge. Let us now consider the case where the charge is distributed over a 
finite volume. Then the potential is 


V(r) = 


JIJ¥ 


qp{r 'U 3 x' = k eq 


f P( r') 
J |r' — i 


,3 / 

■a x , 


where qp{ r') is the charge density at r', and we have used a single integral because 
d 3 x' already indicates the number of integrations to be performed. Note that we 
have normalized p(r') so that its integral over the volume is 1. Figure 29.5 shows 
the geometry of the situation. 

Making a change of variables, R = r' — r, or r' = R + r, and d 3 x 1 = d 3 X, with 
R = (A', Y, Z), we get 


?->](k) = V(k) = ^ 3 yj J d 3 xe~ ik r k e q j p{R + r) d 3 X. (29.13) 


To evaluate Equation (29.13), we substitute for p(R + r) in terms of its Fourier 
transform, 

P (R + r) = (^372 / rf 3 fe'p(k')e ik ' (R+r) . (29.14) 

Combining (29.13) and (29.14), we obtain 

n k) = / d 3 xd 3 Irf 3 ^p(k')e ir (k, - k > 

= k e q J d 3 Xd 3 k'^—p( k') J d 3 xe r(k _k) j 

5(k' — k)by Equation (18.30) 

f , e ik R 

= k e qp( k) / d 3 A-^-. (29.15) 



Figure 29.5: The inverse Fourier transform of the potential of a continuous charge 
distribution at P is calculated using this geometry. 
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form factor 


Fourier transform 
and the discovery 
of quarks 


What is nice about this result is that the contribution of the charge distribution, 
p(k), has been completely factored out. The integral, aside from a constant and a 
change in the sign of k, is simply the inverse Fourier transform of the Coulomb 
potential of a point charge obtained in the previous example. We can therefore 
write Equation (29.15) as 

V'(k) = (27r) 3 ^ 2 p(k) Vc ou i(—k) = 47 r *g (k) • 

This equation is important in analyzing the structure of atomic particles. The 
inverse Fourier transform V (k) is directly measurable in scattering experiments. 
In a typical experiment a (charged) target is probed with a charged point particle 
(electron). If the analysis of the scattering data shows a deviation from l/k 2 in the 
behavior of F(k), then it can be concluded that the target particle has a charge 
distribution. More specifically, a plot of k 2 V(k) versus k gives the variation of 
p(k), the form factor, with k. If the resulting graph is a constant, then p(k) is a 
constant, and the target is a point particle [p(k) is a constant for point particles, 
where p( r') oc <5(r — r')]. If there is any deviation from a constant function, p(k) 
must have a dependence on k, and correspondingly, the target particle must have a 
charge distribution. 

The above discussion, when generalized to four-dimensional relativistic space- 
time, was the basis for a strong argument in favor of the existence of point-like 
particles—quarks—inside a proton in 1968, when the results of the scattering of 
high-energy electrons off protons at the Stanford Linear Accelerator Center revealed 
deviation from a constant for the proton form factor. _ 


29.1.4 Application to Differential Equations 

The Fourier transform is very useful for solving differential equations. This is 
because the derivative operator in r space turns into ordinary multiplication 
in k space. For example, if we differentiate /(r) in Equation (29.10) with 
respect to Xj, we obtain 



1 

( 27r) n / 2 

1 

(27r) n / 2 



r) 

jnj^. c i(k-txi-\ -f kjXj-\ - \-k n x n ) f 

d n k(ikj)e l]ir f( k). 


(29.16) 


That is, every time we differentiate with respect to any component of r, the 
corresponding component of k “comes down.” Thus, the n-dimensional gra¬ 
dient and Laplacian can be written as 


V/(r) = (2tt )~ n / 2 J d n k(tk)e lkr f(k) 
V 2 /(r) = (2tt )~ n / 2 J d n k(—k 2 )e ik r f(k). 


(29.17) 
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Let us illustrate the above points with a simple example. Consider the 
ordinary second-order differential equation 

d 2 x dx 

+ Ci— + C'ox = f(t), (29.18) 

where Cq, C'i, and C 2 are constants. We can “solve” this equation by simply 
substituting the following in it: 


1 

x(t) = -= / dwx{u)e lu \ 

v 27T J — 00 


1 


ax 
dt 2 

This gives 

1 i 


J-c 


diox(Lo)u 2 e lu;t , 


dx 

dt ^2 tt J-c 

1 r 

m = 


J-C 


dux{u) {iu>)e %U}t , 


dhjf(u>)t 


J-C 


dux{io){-C 2 io 2 + iCnu + C 0 )e iut = 


T J-c 


dujf(co)t 


Equating the coefficients of e zut on both sides, we obtain 


x(u>) 


_/M_ 

—C 2 uj 2 + iC\u + Co 


(29.19) 


If we know /(w) [which can be obtained from /(f)], we can calculate x(t) 
by Fourier-transforming x(u). The resulting integrals are not generally easy 
to evaluate. In some cases the methods of complex analysis may be helpful; in 
others numerical integration may be the last resort. However, the real power 
of the Fourier transform lies in the formal analysis of differential equations. 


Example 29.1.5. A harmonically driven circuit consisting of an inductor L, a 
resistor R, and a capacitor C, obeys the following differential equation: 


r d 2 Q dQ 


, Q „ 

+ — = C COS Wot, 

o 


where Q is the charge on the capacitor. Except for the constants, this is identical 
to (29.18). The Fourier transform of cosine is a sum of two Dirac delta functions 
(see Problem 29.6). Substituting in Equation (29.19), we obtain 


ru \ c /^(w-w 0 )+ <5(w +w 0 ) 

“■'n^TiSTpr 


Therefore, 

Q(t) = 


1 r 

s/2n J-c 
£ 


dwQ(w)e*“ ( = 


iu 0 t 


■1: 


2 ^ —Lu> 0 + iRu)o + (1/C) + —LuJq — iRuj 0 + (1/C) 


du> 


<5(w — wo) + <S(w + wo) 
—Lui 2 + iRw + (1/C) ' 

-iui 0 t 
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Noting that the second term in the outer parentheses is the complex conjugate of 
the first term, we obtain 



iui 0 t 


— LuJq + iRuio + (1 /C) 


and using Re{z\/ z 2 ) = {x\X 2 + yry 2 )/\z 2 \ 2 , where x and y are the real and imaginary 
parts of a complex number z, we finally obtain 


rn 


r n 


for the charge on the capacitor and 


m 


dQ c —[(1/C) — Lwo]wo sin wot + R^o cosu>o t 
~dt = [—LoJq + (1/C )] 2 + R^l 


for the current in the circuit. Note the occurrence of a resonance (large current) 
at the voltage source frequency of u>o = l/\/LC. Note also that the Q(t) obtained 
above is a particular, not the most general, solution of the differential equation (see 
Box 24.4.1). ■ 


Example 29.1.6. The one-dimensional heat equation, the PDE governing the 
behavior of the temperature T(x,t) along a rod, is 


dT _ 2 d 2 T 


(29.20) 


where we have used k [see Equation (22.3)] to leave k exclusively for Fourier trans¬ 
forms. Write T(x,t) as a Fourier transform in the x variable 


T(x,t) 




T{k, t)e zkx dk, 


(29.21) 


and substitute in (29.20) to obtain 


1 


V2n J- 


,o° 

J-oo dt 


ikx 77 

e ak = 


i 


V2nj- 


j: 


(—K 2 k 2 )T(k, t)e lkx dk 


9T 2 . 2 , 7 =,/. ,x 

— .= —k k T(k, t) 


This is a first order ordinary differential equation which can be easily solved 

T(k,t) = C(k)e~ K2k2t , (29.22) 


where C(fc) is the constant of integration, which could depend on k. Now suppose 
that initially the temperature distribution on the rod is T(x, 0) = /(*), where f(x) 
is a given known function. Then the last equation gives T(k, 0) = C(k), and (29.21) 
yields 

I r 00 1 p 00 

f(x) = T(x, 0) = —= / T(k,0)e ikx dk = —= / C{k)e ikx dk , 
v 2 tt J — 00 v27T J — 00 

showing that C(k ) is the inverse Fourier transform of f(x): 

C(k) = -^= / f(x)e~ ikx dx. 

v 27T J — 00 
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Substituting this in (29.22) and the result in (29.21) yields 

T(x,t) = ^f°° (/°° f(y)e- iky dy^j e -« 2 fc 2 y k *dk 

1 TOO TOO 

= — / f(y)dy / e-" fc t+ik( - x ~ v) dk. (29.23) 

2 tt 

The inner integral can be done by completing the square in the exponent as in 
Example 29.1.2. The result is 

r e -^H+ik( X -y) dk = . 

J — oo Av'V t 

Putting this in (29.23) and noting that f(y) = T(y, 0), we finally obtain 

1 r°° (*-y) 2 

T(x,t) = -== T{y,0)e—^T dy. (29.24) 

V 47T K 2 t J - OO 


If we know the initial shape of the temperature distribution T(y, 0) on the rod, 
we can calculate the temperature at every point of the rod for any time. A simple 
example is if the temperature is infinitely hot at one point, say xo of the rod and 
zero every where else. Then 


T(y, 0) = T 0 S(y - x 0 ), 


and (29.24) yields 


T(x,t) = 


V 47TK 2 t J - 


f°° (x-i ;) 2 J ’ nP 

/ To8(y — xo)e dy = 

J —OO 


( x-x 0 ) z 
4 K 2 t 


a/47 TK?t 


29.2 Fourier Transform and Green’s Functions 

Suppose you are given a system of n linear equations in n unknowns and asked 
to slove them. An elegant approach would be to use matrices. So, let L be 
the matrix of coefficients, y the column vector of the n unknowns, and f the 
column vector of the constants appearing on the right-hand side of the system 
of equations. The matrix equation and the corresponding system of equations 
would look like the following: 

n 

Ly = f, Y, L >ry.i=^ i = 1,2,... ,n. (29.25) 

j=i 

If L has an inverse G, i.e., if there is a matrix G such that LG = 1, then 
the solution to the above equation can formally be written as y = Gf, or in 
component form as 


Vi — 'y \ Gij fj, i — 1,2 ,..., n. 
i=i 


(29.26) 
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Thus, the problem of solving the system of linear equations turns into the 
problem of finding the inverse of the coefficient matrix; and this is independent 
of what f is! Once I know the inverse of L, I can solve any system of linear 
equations, regardless of the constants on the right-hand side. Recalling that 
the elements of the unit matrix are just the correctly labeled Kronecker delta, 
the equation that G has to satisfy becomes 

n 

LG = 1, ^ L-ijGjk = Sik, i,k= 1,2,... ,n. (29.27) 

j=i 

Now think of a column vector v as a “machine:” feed the machine an 
integer between 1 and n, and it will give you a real number , i.e., the element 
of the column vector carrying the integer as an index. Similarly, think of a 
matrix M as another “machine” which gives you a real number if you feed it 
a pair of integers between 1 and n. Write this as 

\i(i) = Vi, and M(i,j)=My, i, j = 1,2,... ,n. (29.28) 

Would it be beneficial to generalize the action of the machine to include 
all real numbers? A vector machine that feeds on real numbers is a function: 
feed a function a real number and it will spit out a real number. Replacing 
i with x, we have v(x) = v x = v(x), because v x is not a common notation. 
Similarly, M(;r, x') = M xx i = M(x,x'). Furthermore, all summations have 
to be replaced by integrals. For example, the system of equations (29.25) 
becomes 

Ly = f f L{x,x')y{x')dx' = f(x), 

J a 

where (a, 6) is a convenient interval of the real line usually taken to be 
(— 00 , 00 ). What is the meaning of L{x,x')l It can be merely a function 
of two variables. But more interestingly, it can be a differential operator. 
However, a differential operator is a local operator, i.e., it is a linear combi¬ 
nation of derivatives of various orders at a single point, say x. This requires 
the last integral above to collapse to x. The only way that can happen is if 

L(x, x') = S(x — x')L(x) = S(x — x')L x , (29.29) 


where L x is by definition a differential operator in the variable x. 

Now that we have a differential operator which is the generalization of a 
matrix, how do we find its inverse? In other words, how do we generalize 
Equation (29.27)? We suspect that the Kronecker delta turns into a Dirac 
delta function. With this suspicion, we generalize (29.27) to 

LG = 1 , f L(x,x')G(x', xq) = S(x — Xo). 

J a 

Substituting (29.29) in the second equation yields 

[ S(x - x')L x G(x’,x 0 ) = S(x - xo), 
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or 

L x G(x, xo) = S(x — xq). (29.30) 

A function which satisfies this equation is called the Green’s function for 
the differential operator L x . If we can find the Green’s function for L x , then 
the solution to the differential equation L x y(x) = /( x) can be written as the 
generalization of (29.26): 

y(x) = f G(x,x')f(x')dx'. (29.31) 

J a 

To show this, note that 


L x y{x) = L x f G(x,x')f(x') dx' = I L x G(x,x')f(x') dx' 

J a J a 

= [ S(x - x')f(x') dx' = f(x). 

Green’s functions are powerful tools for solving differential equations. Or¬ 
dinary differential equations have ordinary derivatives and the differential 
operator involves a single variable. Partial differential equations correspond 
to differential operators involving several variables. If x denotes the collec¬ 
tion of all these variables, then the differential operator can be denoted by L x 
and the Green’s function by G(x,x'), which satisfies the partial differential 
equation 

L x G(x,x / ) = S(x — x'). (29.32) 

Since Fourier transform turns differentiation into multiplication, and the 
Dirac delta function has a very simple inverse Fourier transform, Green’s 
function are very elegantly calculated via Fourier transform techniques. For 
example, if L x is a second order partial differential operator with constant 
coefficients in n variables, then Fourier transforming only the x variables and 
writing 

G(X ' X,) = 

K* - V) = — G j (29.33) 


the differential equation (29.32) becomes 


d"fcG(k,x')L x e lkx = 


(27t)”/ 2 


d n i fce ik - (x " x ') = 


(27r) n / 2 


d”fce ikx e” ikx '. 


When L x acts on the exponential, it produces a polynomial p(kj) of second 
degree in components kj of k. Therefore, equating the coefficient of e* k x on 
both sides, we obtain 


G(k,x.')p(kj) = 


^ — ik : 


(2n) n / 2 


or 


G(k,x') = 


ik-: 


(27r) n / 2 p(kj) 


Green’s function 
in n dimensions 
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Substituting this in the first equation of (29.33), we get 


G(x,x') 


1 

(27r) n 


d n k- 


D ik- (x—x 7 ) 
p(kj) 


which shows that the Green’s function is a function of the difference between 
its arguments. We therefore have 


G(x — x') 


1 

(2?rp 


d n k 


gik-(x-x') 

p(kj) 


(29.34) 


29.2.1 Green’s Function for the Laplacian 

Equation (29.17) tells us that p(kj ) = — k 2 for the Laplacian. Thus, with 
n = 3, (29.34) becomes 

1 f „ gik-(r-r') 

G ^-J^J d3k — (29 - 35) 

To evaluate this integral, use spherical coordinates in the fc-space, and choose 
the polar axis to be along the vector r — r'. Then, d 3 k = k 2 sin OdkdOdip and 
(29.35) becomes 


G(r - r') = - 


(2tt) 3 


k 2 dk / sin Odd 


f27r i/c|r—r'1 cos 0 

d v -p-■ 

a rv 


The ip integration gives 27 t. For the 9 integration, let u = cos 6. Then the 
integral becomes 

i-i 


G(r - r') = 


1 


r-1 


{2nf 


dk 


due ik |r “ r ' |tt = 


1 


r»oo r'\^ 

dk 


(2tt) 2 J q ik |r-r'| 


1 


(27r) 2 |r — r'| J Q 

2 r c 

(27t) 2 |r — r'| J 0 


oo — ik\r— r 7 | p ik |r—r| 

dk -— 


dk 


ik 

sin (fc|r — r'|) 


Example 21.3.3 calculated the last integral and yielded 7r/2 for it. We thus 
obtain the important result that for the Laplacian, the Green’s function is 


G(r - r') = - 


1 


4-7r|r — r'| 


(29.36) 


From this and V 2 G(r — r') = <5(r — r'), we obtain another important result: 


(jrbf) = ' 4 * i(r - r ')' 


( 29 . 37 ) 
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With the Green’s function of the Laplacian at our disposal, we can solve 
the Poisson equation V 2< l>(r) = —47rfc e p(r) in electrostatics, using the three- 
dimensional version of Equation (29.31): 

<f>( r ) = -47rfc e J d 3 x'G( r - r')p(r') = k e J d 3 x' , 

which is the electrostatic potential of a charge distribution described by the 
volume charge density p{ r). 


29.2.2 Green’s Function for the Heat Equation 

The heat equation was given in (22.3), which, due to the special significance 
attached to the letter k in this chapter, we write as 

r)T r)T 

— = k 2 V 2 T(r), or — - At 2 V 2 T(r) = 0. (29.38) 

This is a PDE in four variables. We let t be the “zeroth” coordinate, and r 
the remaining three. Similarly, the 4-dimensional k space consists of ko and 
k. The polynomial p(kj) of Equation (29.34) is 

p{kj) = iko + K 2 ( k\ + k% + k 2 ) = iko + n 2 k 2 . 

Hence, with n = 4, (29.34) gives 


G(x — x') 


1 

( 2 ^ 

1 


dJk 


e ik 0 {xo-x' 0 )+ik(r-r') 

iko + K 2 fc 2 


d 3 ke ik{r ~ r,) 



dko 


e ik 0 (xo-x’ 0 ) 

iko + K 2 k 2 


(29.39) 


Let’s do the ko integration first. Multiply the numerator and denominator by 
—i to change the denominator to ko — in 2 k 2 ; then use the calculus of residues 
and choose the contour in the UHP (assuming that Xo — x' 0 > 0). The only 
pole of the integrand is at ko = in 2 k 2 . Thus, 



dko 


e ik 0 (x 0 -x' 0 ) 
iko + K 2 k 2 


dko 


e ik 0 (xo-x' 0 ) 
kg — in 2 k 2 


■i (2-Kie- K2k2{x °- x '^ 


Substituting this in (29.39) and using spherical coordinates in the 3-dimensional 
fc-space with the polar axis along r — r' yields 

G(x-x') = -^3 J d 3 ke l k-(r-r') e -* 2 fc a (xo-xJ) 

-| noo PIT P 2,71 

= -j— —t-t / k 2 e- K2k2 ^ x °- x '^dk sin 9d0 dipe ik ^ r '\ cosd . 

(2tt) 3 Jo Jo Jo 


Green’s function 
solves Poisson 
equation 


The ip integration yields 27t, and as in the Laplacian case, the 9 integration 
gives 2sin(fc|r — r'|)/(fc|r — r'|); and since the resulting integrand of the k 
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integral is even, we can extend the lower limit of integration to —oo and 
introducing a factor of half. Thus, the equation above becomes 

c\ r OO 

G(x - x') = ( 27r)2 | r _ r ,| J ke~ K ^ k sin(fc|r - v'\)dk 

i r°° 

or, since sine is the imaginary part of complex exponential, 

G(x - x') = — - - - - Im / fce -« 2 fc 2 (^o-xi) e ik\r-r'\ dk 

{ ’ (27r) 2 |r — r'| J ^ 


(27r) 2 |r — r 


Completing the square in the exponent, we have 


/ OO 

ke - K 2 k 2 (x 0 -*' 0 )+ik\r-r'\ dk (29.40) 

-oo 


—K 2 k 2 (xo—x' 0 )+ik\r—r'\ = —k 2 (xo—x' 0 ) k — 


2k 2 (xq - x' 0 ) J 4k 2 (x 0 - x' 0 ) ’ 


Call the imaginary number in the large parentheses ia and substitute the 
result in (29.40) to obtain 


g (x 0 —Xq) r oo 

G(x-x') = Im / ke~ K (*o-* 0 )(*-*“) dk _ ( 29 . 41 ) 

(27r) |r — r | J _oo 

Change the variable of integration to u = k — ia. Then the integral becomes 


/ OO /‘OO 

(u + ia)e~ K ^ x °- x ^ u2 du = ia e -^{x 0 -x' 0 )u 2 du = i c 

-OO J —OO 


t 2 (x 0 - x ' 0 )' 


The integral involving the u of (u + ia) vanishes because the integrand is odd. 
The Gaussian integral was evaluated in Example 3.3.1. Substituting this and 
the value of a in (29.41), we obtain 


Green’s function 
for heat equation 


G(x — x') = 


4,. 2 (X q ) 


o> r — r 


(2tt) 2 |r - r'| 2K 2 (a’o - x' 0 ) y k 2 {x 0 - x' 0 ) ’ 


or, recalling that Xq = t and assuming that x' 0 = t' = 0, yields the final form 
of the Green’s function for the heat equation: 


G(r - r'; t) = 


(47TK 2 t) 3 / 2 


(29.42) 
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29.2.3 Green’s Function for the Wave Equation 

The wave equation, which we write as 


= 0, (29.43) 

with c the speed of the wave, is a PDE in 4 variables. As in the case of the 
heat equation, we let the fourth variable have 0 as subscript. Then 

P(kj) = + k 2 + kl + k\ = + k 2 , 

and the Green’s function can be written as 


G(x — x') 


1 

(2^ 


(2tt)4 


d 4 k 


gik 0 (xo— ®o)+ik-(r—r') 

fcg /c 2 — k 2 


d 3 ke ik < r ~ r ' ) 



dko 


gikot 

2.2 _ „2 1.2 ’ 
A/q G A/ 


(29.44) 


where we substituted t for xq and assumed x' 0 = t' = 0. 

Let us concentrate on the fcg integration and use the calculus of residues 
to calculate it. The integrand has two poles ko = ±cfc on the real axis, and 
depending on how these poles are handled, different Green’s functions are 
obtained. One way to handle the poles is to move them up slightly, i.e., 
give them an infinitesimal positive imaginary part. If t > 0, the contour of 
integration should be in the UHP with zero contribution from the large circle 
there. If t < 0, the contour of integration should be in the LHP for which 
the integral vanishes because there are no poles inside the contour. Thus, 
denoting the integrand by /, we have 

/ oo e ik 0 t. 

dk 0 , 2 2 , 2 = 2t n [Res(/(cfc)) + Res(/(-cfc))]. 

-OO Kg C fc 


But 

{ „ik 0 t 't ( „ik 0 t 'v 

(k 0 - ck ) -Try > = lim -—■—- > 

ko - c 2 k 2 J ko —>ck ( k 0 + ck J 

Similarly, Res(/(cfc)) = —e~ lckt /2ck, and the fcg integral gives 


gickt 

2 ck 


dko 


pikot 


2.2 _ „2 2.2 
/ —oo ft 0 c h 


= 27 ji 


gickt g—ickt 


2 ck 


2 ck 


= —27T 


Substituting this in (29.44) yields 
G(x — x') = 


C f d 3 kc lk(r - r ' ) SiUCkt 


(27 t) 


sin ckt 
ck 
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Advanced Green’s 
function for wave 
equation 


Laplace transform 
defined 


which, through a by-now-familiar routine in fc-space spherical integration 
yields 


G'(r — r'\ t) = 


2c 


(27t) 2 |r — r'| J o 
c 


dfcsin(fc|r — r'|)sincfct 

oo gik\r— r| ^—ik\r—r'\ g ickt g —ickt 

dk - 


(27t) 2 |r — r'| J _ OQ " 2 i 2i 

Multiply the exponentials and note that e~ lx dx = e lx dx to obtain 


G(r — r'-1) = — 


dk 


ick(t+\r—r' \/c) _ icfc(t—|r—r'|/c) 


4(27t) 2 |r — r'l 

2 ( 27 t) 2 |r-r'| [ ^ + |r “ r ' l/c) " 2n6 ^ ~ |r “ r ' l/c)] ’ 

(29.45) 


The first delta function vanishes because t > 0. Therefore, the final form of 
the Green’s function for the wave equation is 


G ret (r - r';t) 


S(t — |r — r'|/c) 
47t|r — r'| 


(29.46) 


The subscript “ret” on the Green’s function stands for retarded. As the 
argument of the delta function implies, G re t(r — r'; t) is zero unless t = |r — 
r'| /c, i.e., unless the wave has had time to move from the source point r' to the 
observation point r. The signal is “retarded” by this amount of time. Had we 
given the poles of the ko integral of (29.44) an infinitesimal negative imaginary 
part and chosen t to be negative, the first delta function of (29.45) would have 
survived and we would have obtained the advanced Green’s function: 


G adv (r - r';f) 


S(t + |r — r'|/c) 
47r|r — r'| 


(29.47) 


29.3 The Laplace Transform 

In the previous section, the power of the Fourier transform was illustrated by 
formalism and application. Fourier transform is by far the most important of 
all the transforms used in mathematical analysis. Another transform which is 
widely used in solving ordinary differential equations is the Laplace transform, 
the subject of this section. 

Let f(t) be a sufficiently well-behaved function. The Laplace transform 
of / is another function £[/] whose value at s is given by 

/>00 

£[/](*)= / e~ st f(t)dt. (29.48) 

Jo 
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s could be complex, although it is usually taken to be real. To assure the 
convergence of the integral, s must have a positive real part. The left-hand 
side of (29.48) is usually denoted by F(s). It is also common to write it (less 
precisely) as £[/(£)] with the letter s understood! 

Example 29.3.1. The Laplace transform of the unit function—the function whose 
value everywhere is 1—evaluated at s can easily be shown to be 1/s. The Laplace 
transform of e 1 " 4 can be readily calculated as well: 


/•'(*) - 


/" 


— st iujt 

e e at = 


f 


dt = 


0 (-s+iui)t 


— S + iu> 


The Laplace transforms of smart and cos art can now be evaluated: 


= R « bl'T) - Re (^-) = Re (^5) = ‘ 2R49 > 

and 

£ [sin art] = fm = Im (^-) = Im = ^35 ■ (29-50) 

The Laplace transform of the step function 9{t — a ) is very useful in applications 
[see Section 5.1.3 for the definition of the step function]. 


L[0(t — a)] = / e st 6(t — a)dt = 

Jo 

The lower limit of integration was changed because 9(t — a) is zero for t < a (and it 
is equal to 1 for t > a). 

Knowing £[1], we can find the Laplace transform of any power of t because 

/• 00 in r 00 

L[t n ]= J t n e~ st dt = (-l)"^y e~ st dt. 

Since £[l](s) = 1/s, we have 

q«"] - (-1)"£ (1) - ^r. (29.51, 



What if n in the above equation is not an integer? Let’s evaluate £[t ! '] directly. 



1 

gV+l 



du = 


r(^ + i) 

S R+I ’ 


(29.52) 

where T is the gamma function introduced in Section 11.1.1. Note that if v = n, we 
regain (29.51) because T(n + 1) = n\. ■ 


29.3.1 Properties of Laplace Transform 

In a typical application, one obtains the Laplace transform of a function from, 
say a differential equation, and inverts it to find the actual function. This is 
what was done in the case of the Fourier transform, and indeed in any other 
transform used. While the formula for inverting a Fourier transform [see 
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Equation (29.7)] is nice and symmetric, that for the Laplace transform is not 
as nice. Furthermore, Fourier transform adapts itself very naturally to partial 
differential equations as demonstrated in the previous section. However, the 
adaptation of Laplace transform to PDEs is not so natural. That is why the 
Fourier transform techniques are much more powerful- both formally and for 
calculations—than the Laplace transform. 

Because of this drawback, one has to rely on some formal properties of the 
Laplace transform and its inverse—as well as a lot of examples—to be able to 
reconstruct the original function. 

Linearity 

One such property is the linearity of the Laplace transform and it inverse: 
C[af + bg]=aL[f] + bJl[g}, £ _1 [a/ + bg] = a£ -1 [/] + (29.53) 

First shift property 

Another is the first shift property. If F(s) is the Laplace transform of /(f), 
then F(s — a) is the Laplace transform of e at f(t). This can easily be verified: 

nOO nOO 

F(s — a) = / e~ {s ~ a)t f(t)dt = e~ st (e at /(f)) dt = &[e at f(t)]. (29.54) 
Jo Jo 

A more useful way of writing this equation is 

£ _1 [F(s - a)] = e at L~ 1 [F(s)]. (29.55) 

Second shift property 

The second shift property involves the step function: 

£[0(t - a)f(t - a)} = e" as L[/](s). (29.56) 

This is because 

/»oo /»oo 

L[6(t—a)f(t—a)\ = / e~ st f(t-a)dt= e~ s{T+a) f(r) dr = e~ as Z[f](s), 

J a JO 

where in the second equality we changed the variable of integration to r = t—a. 
Denoting by F(s) the Laplace transform of /(f), we write (29.56) as 

F{s)] = 8(t - a) fit - a) = \ f{t ~ a) | f ° >Q (29.57) 

10 if a < 0. 

Example 29.3.2. Since L[t n ] = n\/s n+1 , using the first shift property, we get 

n\ 

(s — a)” +1 ’ 


r at-\ 

L[t e \ 


(29.58) 
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In particular, if n = 0, we have H[e at ] = l/(s — a). From this, and the linearity 
property, we can find the Laplace transforms of the hyperbolic sine: 


£[sinh 7 f] = £[I (e 7< - e- 7t )] = I (L[e 7 ‘] - L^]) 


1 

2 


( — 

V«-7 


s + 7 



(29.59) 


and hyperbolic cosine: 

[cosh t* 1 = L[± (e 7 ‘ + e yt )] = i (£[e 7 ‘] + Lie- 7 *]) 

- 1 ( 1 1 \ _ s 

2\s — 7 ”*~s + 7 / s 2 — 7 2 ’ 


(29.60) 


With our accumulated knowledge of the Laplace transform, let’s see if we can find 
the inverse transform of l/(s 2 + 2 as + b 2 ). Complete the square in the denominator 
and consider three cases: b > a, b < a, and 6 = 0. First assume b > a and define 
oj 2 = b 2 — a 2 . Then 


1 

I 

= L C -1 

LU 

p—at 

= i _^-i 

LU 

e~ at 


. (s + a) 2 + b 2 — a 2 

LU 

. (s + a) 2 + U! 2 _ 

LU 

_S 2 +OJ 2 „ 

LU 


L“ 

where we used (29.55) and (29.50). Substituting for w, we get 

_ — at 


■ sin ujt 




s 2 + 2as + 6 2 


a/ 6 2 — a 2 

For b < a define 7 2 = a 2 — b 2 . Then 


sin( ye 2 —a 2 1), a < b. 


1 

i 

= ic- 1 

7 

p — at 

= t _£ _1 

7 

e~ at 


. (s + a) 2 +b 2 — a 2 

7 

. (s + a) 2 — 7 2 _ 

7 

_ s 2 — 7 2 . 

7 


no¬ 
where we used (29.55) and (29.59). Substituting for 7 , we get 

_ — at 


■ sinh "ft 


L _1 


1 


■s 2 + 2 as + b 2 


\/a 2 — 6 2 


sinh(v a 2 — b 2 t), a > b. 


If b = a, then the denominator is a complete square and 




(s + a) 2 


by (29.58). 

Similarly, we can show that 


£ _1 


£ _1 


s 2 + 2 as + b 2 


—e at cos(f/b 2 — a 2 t ) 


V6 2 - a 2 


e sin(i/b 2 — a 2 f), b > 


s 2 + 2as + 6 2 


=e cosh(i/a 2 — 6 2 t) 


\Ja 2 — b 2 


e sinh(v a 2 — 6 2 f), 6 < 


We shall use the formulas derived in this example in solving differential 
equations. ■ 
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Periodic functions 


Although Fourier series are better suited for periodic functions, Laplace trans¬ 
form of periodic functions is also of interest. If /(f) is periodic of period T, 
he., if f(t + T) = /(t), then 


pT poo poo 

£[/(*)]=/ e~ st f{t) dt+ / e~ st f(t) dt = F\(s) + / e-^ u+T ^f(u + T) 

J o Jt Jo 


du 


call this Fi(s) 


=£[/«] 


p oo /»oo 

= F 1 {s) + e~ sT e~ su f{u) du = Fi(s) + e~ sT / e~ st f(t)dt. 

Jo Jo 

We thus have £[/(t)] = F’i(s) + e _sT IL[/(f)], which upon solving for £[/(t)] 
yields 

£-[/(*)] = , (29.61) 

Example 29.3.3. The Laplace transform of the square wave function of Example 
10.6.1 defined by 

(To if 0 < t < T, 

(0 if T < t < 2T, 

can be readily found. We simply note that the period is 2 T and -Fi(s) is 

r2T r 2T 


Fi{s)= [ e~ at V(t) dt = Vo [ 
Jo Jt 

Substituting this in (29.61), we obtain 


dt= —{e~ sl — e~ 
s 


1 


WWil** l _ e -wr 


Vo , -sT —2sT\ 

T (e -e ) 


V 0 e- 


V 0 


Voe~ sT (l — e~ sT ) _ ___ 

s(l — e -sT )(l + e~ aT ) s(l + e _sT ) s(l + e sT )' 


Convolution 

The convolution of two functions is defined as 

(/ * ff)(t) = ( f(u)g(t - u)du. 

Jo 

Let v = t — u and change the variable of integration to v. Then 

( f*9){t)=[ f(t-v)g(v)(-dv)= [ g(v)f(t — v)dv = (g * /)(f), 

Jt Jo 

showing that convolution is commutative. Commutativity is only one of the 
following properties of convolution: 
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Example 29.3.5. It is not easy to find the inverse transform of F(s) = ln[(s + 
a)/(s + 6 )] directly. But the inverse transform of 

F'(s) = T-[l n ( s + a) — ln(s + 6 )] = — -—- 

ds l s + a s + b 


is much easier to find. In fact, 
£ _1 [F'{s)] = £ _1 


s + a 


-£' 


s + b 


— at —bt 

= e — e , 


by (29.58) with n = 0. Therefore, according to (29.63) 

s + a 


£~ 


In- 


s + b 


„ — bt _— at 

e — e 


t 


One can also find the primitive (antiderivative, indefinite integral) of F(s). 
Recall that the indefinite integral of a function can be written as a definite 
integral with one of its limits being a variable [see Equation (3.18)]. Therefore, 
let’s write the indefinite integral of F(s) as — / F(u) du. This integral can 
be easily evaluated: 


/»oo /»oo /»oo /»oo pC 

/ F(u)du= / du e~ ut f(t)dt= / f(t)dt / 

J s Js Jo Jo Js 


t du 


roo / - u t 

l 


e stM dt 

t 


This can be written as 


L [f] ( u ) du = L 


m 

t 


(29.64) 


Example 29.3.6. Let’s use (29.64) to find the Laplace transform of sin ut/t. From 
(29.50), we have 


£ 


sinud 


t 


f 

J S 


U 2 + U! 2 

(see Problem 29.17 for the last equality). Similarly, 
F sinh 7 t 


£ 


-f 


du= tan' 1 f - tan’ 1 0) = tan" 1 0) 

ilarly, 

(; 


1 1 
-- du = — lim 


u + 7 


du 


= i lim 0 ^-ln^Uilni + X 
2 \ x + 7 s + 7 / 2 s — 7 


29.3.3 Laplace Transform and Differential Equations 

Certain differential equations with appropriate boundary conditions or initial 
values can be nicely solved by Laplace transform techniques. For the appli¬ 
cation of Laplace transform to differential equations, we need to know the 
transform of the derivative of a function. Using integration by parts, we have 


pOO pc 

/ e- st f'(t)dt=e- st m\™+s 
Jo Jo 


e st f(t)dt = -f(0) + sL[f](s). 
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Therefore, 

This can be iterated to give 


(29.65) 


£[/"](*) = sZ[f'](s) - /'(0) = 8[sZ[f](s) /(0)] - 


or 

£[/"](«) = 5 2 L[/]( S ) - s/(0) - /'(0). (29.66) 

We can continue iterating the formula, but since most differential equations 
encountered in applications are of second order, we stop at the second deriva¬ 
tive. 

To solve a differential equation, take the Laplace transform of both sides 
and use (29.65) and (29.66). Solve for £[/](s) and take the inverse transform 
to find the solution. Let’s look at a specific example. Consider a mass m 
attached to a spring of spring constant k. The differential equation of motion 
of this system is 

mx + kx = 0 or x + u> gX = 0, u>o = 

Taking the Laplace transform of both sides gives 

L[i](s) + WgL[x](s) = 0. 

Using (29.66), this becomes 

s 2 L[a;](s) — s;r(0) — i(0) + Wg£[a;](s) = 0, 
or, letting = x(0) and ±o = x(0 ), we get 

(s 2 + Wq)L[x](s) = sx 0 + x 0 or L[x](s) = -^ . 

s + 

and from (29.49) and (29.50) we obtain 





Note how the initial values are automatically included in the solution. 

A more general problem has a damping term as well as a driving force. 
This leads to a differential equation of the form 
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and use (29.65) and (29.66) to get 

s 2 £[x](s) - x 0 s - x 0 + 7 {s£[x](s) - x 0 } + Wq£[x](s) = £[/](s), 
where xq = x(0) and Xq = x(0). Therefore, 

(s 2 + 7 s + Wq)£[x](s) = £[/](s) + XqS + x 0 + 7 x 0 , 


which yields 


£[x](s) = 


£ [/] (s) + X 0 S + Xp + 7X0 
S 2 + 7S + Wq 


The solution can be obtained by inversion once we know £[/](s). Symbolically, 
we write 


x(f) = £ 


-1 


£[/](*) 


[S 2 + 7 S + Wq J 

(x 0 + 7X 0 )£ _1 


- xo£ 1 

1 


s 2 + 7 s + Wq J 


S Z + 7 s + 0J 0 


(29.68) 


We consider only the case of a damped harmonic oscillator, i.e., that 
cuq > 7 / 2 . The second and third inversions are given in Example 29.3.2 
with a = 7/2 and b = ojq. Then, with f1 = \J — ( 7 / 2 ) 2 , we have 


£ _1 

£ _1 


s 

s 2 + 7 s + U)q 

1 

S 2 + 7s + U>Q 


= e 7(7,2 cos f U 


J_ e -7*/2 
20 


sin Of, 


e -7 */ 2 

0 


sin Ot. 


Substituting these in (29.68), we obtain 


(29.69) 


x(t) = e 


- p -lt/2 


X’o cos 0 1 


£0 + X 01/2 
O 


sin £lt I + L 


• -1 


£[/](*) 

S 2 + 7 s + Wg 


(29.70) 

Let us denote the last term of this equation by <3>(t) and evaluate the equation 
at t, = 0 to obtain x(0) = Xo + 3>(0) implying that <T>(0) = 0. Similarly, 
differentiating the equation and evaluating the result at t = 0 yields x( 0 ) = 
xo + ^(O) implying that 4>(0) = 0. This is an interesting result since /(f) 
is quite arbitrary! The following example looks at a specific instance of this 
result. 


Example 29.3.7. As an example of the general formula (29.70), let’s consider a 
damped harmonic oscillator driven by a sinusoidal source /(f) = Asinwot operating 
at the natural frequency of the oscillator as given in (29.67). Then by (29.50) 

£[/](«) = £ [A sin w 0 t] = 2 ’ 

S ~r L0q 

and the last term of (29.70) becomes 


£ _1 


Auj 0 

(s 2 +wg)(s 2 + 7 s + w§) 
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Using partial fraction techniques, we can write this as 

Au>o As A 1 As 

(s 2 + Wq)(s 2 + ys + ujg) yuo s 2 + ys + Uq ujo s 2 + ys + lOq yu>o s 2 + lOq ' 

Each term can now be inverted using the results we have obtained in several exam¬ 
ples. Denoting the final result by $(t), we get 


$(t) = 


J4/M 


[s 2 + 75 + cofi] 


A 

7CJ0 


coswot H——e 7t,/2 fcosDt-l- —A sin Sit') . 
yu>o V 2 S2 / 


Note that $(0) = 0 as expected from the discussion above. Differentiating, we 
obtain 


4>(t): = — sinivnt-e 7 */ 2 (cos fit--i ——sinOt') H-e W 2 (— f 2 sinf 2 f+ —cos fit] . 

w 7 2 w 0 V 2ft J yu 0 V 2 / 

It is readily verihed that d>(0) = 0 as explained above. Substituting <I>(t) for the 
last term of (29.70) yields 


x(t ) = e 7t ^ 2 (xocosf It + — sinf 2 f) 

-— coswot H——e _7t ^ 2 ( C osfit + -t- sin Qt] . (29.71) 

yu>o yaio 7 2 11/ 

After a long time (i.e., as t —> 00 ), the terms containing an exponential—the so- 
called transient terms- will be negligible and x(t) —* — cosutot as expected from 
the elementary analysis of the same problem. g 


We can understand this interesting behavior of $(f) in terms of the prop¬ 
erties of convolution. Let g(t) be the inverse transform of l/(s 2 + 7 s + Wq). 
Then invoking Box 29.3.1, the last term of (29.70) can be written as 

®(t)=L- 1 [L[f](s)-L[g]{s)]=(f*g)(t)= f f(u)g(t - u)du, 

Jo 

whose derivative is (see Box 3.2.2) 

®(i) = f(t)g( 0 ) + [ f(u)g(t - u)du. 

Jo 

It is now clear why <&(0) = 0. As for the derivative, we see that $(0) = 
f(0)g(0). But g(t) is given by (29.69) which is clearly 0 at t = 0. 


29.3.4 Inverse of Laplace Transform 

As mentioned earlier, the procedure for inverting a Laplace transform is im¬ 
portant in solving differential equations, as the technique—like any other 
transform—yields the transform of the solution, and to get the solution, one 
has to invert that transform. So far, we have used various tricks and prop¬ 
erties of the Laplace transform to get from F(s) = £[/](s) to /(f). Now, we 
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provide a general formula that can always be used to yield the function. The 

Mellin inversion procedure is the Mellin inversion integral: 
integral 

i ^7+ioo 

m = 7T- / F( s )e st ds. (29.72) 

Z 7 TI J 'y —ioo 

Bromwich contour The integration is along a line, called the Bromwich contour, parallel to 
the imaginary axis of the complex s plane. The real number 7 is arbitrary as 
long as the integration line is to the right of all the singularities of F(s). To 
find the actual value of the integral, one closes the contour with an infinite 
semicircle to the left- of the line and uses the residue theorem. 

To prove that the right-hand side of (29.72) is indeed /(f), substitute the 
definition of F(s), 

/»oo 

F(s) = / /(r)e" sr dr, 

in the integral and switch the order of integrations to get 

-1 /*oo p-y+ioo 

RHS = — / f[r)d,T / e s{t ~ T) ds . (29.73) 

Jo J 7 —ioo 

s -v-' 

Denote this by J 


Introduce a new variable of integration a by s = 7 + ia in the inner integral 
to get 


/ OO /»00 

e (7+«r)(t— t)^ = ie l(t-r) / e i«(,t-T) da = 2t dS{t - t). 

-OO J ~ OO 


=2nS(t—T) by (18.28) 


The last step follows because S(t — r) = 0 unless t = r in which case the 
exponent of the exponential is zero. Substituting this in (29.73) and noting 
that r > 0, we get RHS = f(t). 

To see why the integration line must lie to the right of all singularities, 
take the Laplace transform of both sides of (29.72): 


1 nOO / />7+ioO 

£[/(*)] = y- / e- st / F(a)e^da 

Z'K'l J o \J 7 —ioo 


dt 


1 

27ri 


/•7+ioo 


F(a)d> 


a 


e (<’~s) t dt=-- 


7 — ZOO 


'0 


27ri 


/» 7+200 


7—200 


F(a) 


da. 


assuming that Re(s) > Re(cr) = 7 . If F(a) is analytic to the right of the 
Bromwich contour, then closing the infinite semicircle on the right, there will 
be a single pole at a = s inside the closed contour, and the residue theorem 
gives the value of the integral as —2iriF(s), with the negative sign coming 
from the clockwise integration. If any of the poles of F were on the right of 
the Bromwich contour we would not obtain —27 t iF(s) for the integration. 
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Example 29.3.8. Let us find the inverse Laplace transform of F(s) 
This is given by 


1 n + 

L 


7+ioo 


S 2 + U 2 


■ ds 


l/(s 2 +u> 2 ). 


where the contour of integration includes the infinite semicircle to the left. The 
poles of the integrand are at ±*oj, so as long as 7 > 0 , the contour encloses both 
poles. The residue theorem then yields 


m = 


2ni 

iut 


2 ^ri 


Res 


(s — iu))(s + iui) 


+ Res 


(s — iui)(s + iui) 


—iut 


w— + —Z— = - sin LOt 
2iu> —2iui lo 


which is the expected result (see Example 29.3.1). 


We can similarly find the inverse Laplace transform of F(s) 


s/(s 2 + lo 2 ): 


m 


1 

27H 


f'l+ioo Se st 


■ ds 


' 7 — ZOC 


The contour of integration again includes the infinite semicircle to the left, 
and the poles of the integrand are at ±iaj, as above. The residue theorem now 
yields 


f(t) = 


-Res 


Res 


(s — iuj)(s + iu) / 


( se st \ 


\ (s — ito)(s + iui) J 

s=—iuj 


uoe 


2 iu> 


—2 iui 


= cos tot 


which is also treated in Example 29.3.1. 


29.4 Problems 

29.1. Find directly the Fourier transform of 

(a) the constant function /( x) = (7, and 

(b) the Dirac delta function S(x). 

29.2. Show the second identity in (29.8). 

29.3. Show that the inverse of a sine transform is another sine transform. 

29.4. Show (29.9), the linearity property of Fourier transform and its inverse. 

29.5. Suppose that f(k) is the inverse Fourier transform of /( x). Show that 
the inverse Fourier transform of f(x + a) is e mk f(k). 
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convolution 
theorem for 
Fourier transform 


Parseval's relation 


29.6. Show that if f(t) = coswqL then 


/(w) — y — w o) + <^(w + wo)]. 

29.7. Show that 

(a) g(x) is real if and only if g*(k) = <?(—fc), 

(a) g(x) is imaginary if and only if g*(k) = —g(—k), and 
(c) if g(x) is even (odd), then g(k) is also even (odd). 


29.8. Evaluate the Fourier transform of 
g{x) = 


b — b\x\/a if |x| < a, 
0 if \x\ > a. 


29.9. Let 


Show that 


/(*) = 


sin wot if |t| < T, 
0 if |t| > T. 


/(w) = 


y/2 7T 


sin[(w — wo)T] sin[(w + wo)T] 1 
W — Wo W + Wo J 


Verify the uncertainty relation AwAt ss 47t. 

29.10. If f(x) = g(x + a), show that f(k) = e~ iak g{k). 

29.11. For a > 0 find the Fourier transform of /( x) = e _a l x L Is f(k) sym¬ 
metric? Is it real? Verify the uncertainty relations. 

29.12. The displacement of a damped harmonic oscillator is given by 


/(*) = 


Ae -<xt e iu, 0 t if t > 0, 

0 if t < 0. 


Find /(w) and show that the frequency distribution |/(w)| 2 is given by 

= 2^ (w - w 0 ) 2 + a 2 ' 

29.13. Prove the convolution theorem for Fourier transform: 


/ OO 

f{k)g(k)e lky dk. 

-OO 


What will this give when y = 0? 

29.14. Prove Parseval’s relation for Fourier transforms: 


f( x )g*(x) dx = / f(k)g*{k)dk. 


— OO 


— OO 
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29.15. Find the sine and cosine transform of e ax . 

29.16. Following Example 29.1.6, substitute the Fourier Transform of the 
wave function '3/(a;, t) in the one-dimensional wave equation 

1 9 2 T _ <9 2 T 
c 2 dt 2 dx 2 ’ 

and solve the differential equation in t to get 

*i>(M) = C(k)e ±ickt . 

Assuming that the initial shape of the wave 'F(a;, 0) is given by a function 
f(x), show that the solution \P (ar, t) can be written as 

'3/(a;, t) = f(x ± ct). 

29.17. Show the relation used in Example 29.3.6: 



Hint: Let x denote the left-hand side and a = tan 1 {s/u>). Take the tan of 
both sides of the definition of x and use cot a = tan(7r/2 — a) = 1/ tana. 

29.18. Let f(t) = sin cut be the periodic function of (29.61) and verify that 
the equation holds (for T = 2i t/u). Do the same for f(t) = cosuit. 

29.19. Find the Laplace transform of the periodic sawtooth function with 
period T defined by 

V(t) = V^- for 0 <t<T. 

29.20. Find the Laplace transform of 2 1 + 4e 2t — 3cos3t. 

29.21. Compute £ [cosh 2 yt] and £ [sinh 2 qt]. 

29.22. Compute L[cos 2 u>t] and £[sin 2 u>t\ directly from the definition of Laplace 
transform. Now show that 

L [cos 2 u>t] = £ [1] — £ [sin 2 u>t]. 

29.23. A function N(t) is called a null function if 

N(u) du = 0 
for all t > 0. Show that L[-/V(t)] = 0. 

29.24. Compute L[e 2t sin3f], L[t 2 e _7t ], L _1 [e _2s /s 3 ], and 
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29 . 25 . Find L[e 3t /\/i\, £[\/t], and £ 1 [e 2a /y/s\- 

29 . 26 . (a) Show that 

A t"- 1 = t v ~ l \nt. 
ov 

(b) Now use (29.52) to prove that 


£ [t v 1 In t] = 


r 7 (i/) — T(i/) In s 


29 . 27 . Using Laplace transform, solve the following initial-value problems 
cPx 

(a) + Ax = sinf, x(0) = 1, x(0) = 0 
at z 

( b ) ^ - % = te?, x(0) = 2, x(0) = 1 


(c) S + 5 - 9(1 t] 5 


c(0) = 1, x(0) = —1, where 8 is the step 


function. 
d 2 x 

(d) —— + x = 8( 7r — t)cost, x(0) = 0, x(0) = 0, where 8 is the step 
dt z 

function. 


29 . 28 . Using Laplace transform, solve the following boundary-value problems 
d 2 x 

(a) + u,2x = sinwi > ^(0) = x (^j) = 7r - 

( b ) ^f+w 2 x = f, x(0) = 1 , x(%) = — 1 . 


29 . 29 . Find £ - 1 )—tt] and £ 1 [ S ^- Q ^ ] using Mellin inversion integral 
(29.72). 
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In a typical multivariable extremum problem, you are given a function of n 
variables f(xi,X 2 , ■ ■ ■ ,x n ) and asked to find those n values of the variables 
that maximize or minimize the function. The procedure is, of course, to set 
the partial derivative of the function with respect to each variable equal to 
zero and solve the resulting equations. 

Geometrically, / is a function in an n-dimensional space, and the problem 
is to find the point in that space at which / has the highest (or lowest) 
value compared to the neighboring points. There is another geometric way 
of looking at the extremum problem. Think of (x\,X2, ■ ■ ■, x n ) as a piecewise 
linear path in a two-dimensional coordinate system. The horizontal axis is 
restricted to the values 1, 2,..., n, and for each of these values i the value of 
the corresponding variable x-j determines one point with coordinates (i, x.;). 
Connecting the neighboring points by a straight line segment produces the 
path. Figure 30.1 shows a couple of such paths. 



Figure 30.1: For each integer % between 1 and n, pick the real number x, and draw 
a point with coordinates ( i,Xi ). Connect these points to form a path. Two such paths 
are shown for n = 5. 


New way of 
looking at the 
multivariable 
extremum problem 
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The extremum problem can now be stated in terms of paths: Find the path 
for which / has either the largest or the smallest value compared with its value 
at the neighboring paths. And to do so, we differentiate with respect to a point 
of the path. But let’s be more general in anticipation of the problems typical 
of this chapter. Let x a be a variable where a is not necessarily an integer 
between 1 and n. Differentiate the function with respect to x a and set the 
result equal to zero: 


i~\ p Tt O p A b~L A p 

Of = df_dxj_ = \ b df_ = 

dx a Q Xi Q x 2-^i Q x m 

i —1 i= 1 


(30.1) 


If a is not equal to one of the integers between 1 and n, the sum vanishes 
identically, i.e., the left-hand side is identically zero because / is not a function 
of x a . However, if a is one of the integers between 1 and n, (30.1) gives one 
of the equations to be solved for determining the extremizing path. 


30.1 Variational Problem 

Our treatment of the extremum problem above in terms of paths was mo¬ 
tivated by situations in which variations of smooth paths are to be consid¬ 
ered. A typical variational problem has a function whose value depends 
on the path, i.e., it takes a path and puts out a number. We say that it is a 
Functional defined functional, because its argument is a function rather than a set of numbers. 

If L is a functional and x(t ) represents a path in the tx- plane, then the value 
of the functional for this path is represented by L[x]. The most common func¬ 
tional integrates a certain function of x(t) and x(t) over some interval (a, b). 
If L(x, x, t) is such a function, then 

f b 

L[x] = / L(x{t), x(t), t) dt. (30.2) 


For every path, the integrand becomes a function of t which can be integrated 
to give a single number, and the variational problem asks for the path that 
yields either the largest or the smallest such number. 

Example 30.1.1. Before delving into formalism, let’s look at a very simple con¬ 
crete example. Take two points P a = ( a,y a ) and Pb = ( b,yb ) in the zy-plane. 
Consider points Py = (^j^,T) lying on the perpendicular bisector of the interval 
( a,b ), and the path consisting of the line segments P a Py and PyPb as shown in 
Figure 30.2. For what value of Y is the length of this path minimum? 

The length L of the path is given by 
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Figure 30.2: Here, a path consists of only two line segments. The middle point Py 
is constrained to move on the vertical line on which it is located to produce different 
paths. 


The equation of the path can be shown to be 


( 2(y - y a ) __ , (a + b)y a — 2aY 
x ~t~ 


y(x) = { 


b — a 


2 {yb - Y) 2 bY - (a + b)y a 

x + ■ 


b — a b — a 

Substituting this in the integral gives 


if a < x < (a + b)/2, 
if (a + b)/2 < x < b. 


L = 



4(y - ya) 2 
(b — a) 2 


dx + 


L 


(a + b)/2 



4 (y b - Y) 2 
(.6 - a ) 2 


dx 


= \ [v / (&-a) 2 +4(y-2/a) 2 + V(b - a) 2 + 4 (y b - F) 2 " . 


Differentiating with respect to Y and setting the result equal to zero leads to the 
following equation: 

Y-y a _ yb — Y 

V (b — a) 2 + 4(F — j/a) 2 v/(6-a) 2 +4(t/ 6 -y) 2 ' 


Scjuare both sides and simplify to get Y — y a = yb — Y, whose solution is Y = 
( y a + yb)/2, placing Py on the line joining P a and Ph- Thus among all the paths 
PaPyPb the shortest is the straight line joining P a and Pb■ ■ 


30.1.1 Euler-Lagrange Equation 

The preceding example showed that from among paths consisting of two spe¬ 
cific straight line segments, the one whose middle point lies on the straight 
line joining the two end points gives the shortest length. What if the point 
Py is not on the perpendicular bisector of (a, b ), or if the path has more than 
three points? There is a procedure which picks the minimizing path from 
among all possible paths. Let’s discuss this procedure. 
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Functional 

derivative 

explained 


A fundamental 

functional 

derivative 


Another 

fundamental 

functional 

derivative 


Euler-Lagrange 

equation 


Going back to Equation (30.2), we ask if there is a process whereby one 
can take the derivative of L[ar], set it equal to zero, and solve for the desired 
path. Is there a derivative with respect to a path? To find out, let’s see if 
we can generalize (30.1) from the discrete case of a path consisting of only n 
points to a continuous path. The derivative with respect to a path is called 
a functional derivative and S is used instead of d to symbolize it. So, let’s 
write 

f£) = s^r)l = (30.4) 

In analogy with (30.1), and noting that L is to be considered as an ordinary 
function (not functional) of x, x, and t, we have 


S 

6x(t) 


L(x(t),x(t),t) 


dL Sx(t) dL Sx(t) 

dx Sx(r) dx Sx(r) ’ 


(30.5) 


because t is independent of x(t). In the discrete case, we had jfff- = 5 a i. 
What is the generalization of the Kronecker delta to the continuous case? 
The Dirac delta function! This can be shown more rigorously, but the proof 
is outside the scope of this book. So, let’s write the fundamental functional 
derivative: 


5x(t) 

Sx(t) 


S(t-r). 


(30.6) 


What about the functional derivative in the second term of (30.5)? Using 
the definition of the derivative and (30.6), we have 


Sx(t ) 
Sx(r) 


5 x(t + e) — x(t) 

——— lim- 

ox(r) c—*o e 


[1 / Sx(t + e) 
[l V Sx(t) 


lim 

e —*0 


- (S(t + e — r) — S(t — t)J 




Sx(t) \ 

5x(r) J 

(30.7) 


Putting (30.6) and (30.7) in (30.5) and the result in (30.4), we obtain 


JL[x] 

Sx(t) 


dL dL d 

3i S{t - T)+ S. 


, dL ddL 

dt= ^ {T) -TrTi {r) ' (30 ' 8) 


where in the last step we used the properties of the Dirac delta function and 
its derivative as given in (5.10) and (5.11). We have assumed that r lies in 
the interval (a, b). 

Having found the functional derivative, we now equate it to zero and find 
the equation that determines the path—the function x(t) —which extremizes 
the functional. The equation is 


dL _ d_dL 
dx dt, dx 


(30.9) 


and is called the Euler-Lagrange equation. It is at the heart of all varia¬ 
tional problems. If we know the function L , we can differentiate it, substitute 
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the derivatives in (30.9) and solve the resulting differential equation. We 
should emphasize that a path could be written as y{x) or any other form, 
depending on the variables used in a particular problem. 

Example 30.1.2. Shortest Length Example 30.1.1 looked at very specific 
paths connecting two points and found that the straight-line path minimizes the 
length. Is this true for all paths? 

For any path y(x), the length between ( a,y a ) and ( b,ys ) is given by the (30.3), 
where the independent variable is x and dependent variable is y. Thus, L = y 1 + y' 2 
and the Euler-Lagrange equation becomes 


dL d dL _ n ^ d ( y' 

dy dx dy' ° r dx ^ y'l+y' 2 J 


(30.10) 


Differentiating the expression inside the parentheses yields 


(l + J/' 2 ) 3/2 


= 0 , 


// r. 

y = o, 


or y = cx + d, 


where c and d are the constants of integration. This is the equation of a straight line. 
Thus out of all the possible paths between ( a,y a ) and ( b,yb ), the straight line gives 
the smallest length. Actually, we don’t know if the straight line is the shortest or the 
longest distance. Euler-Lagrange equation, being the first derivative, is necessary, 
but not sufficient. As in calculus, to show minimality one has to look at the second 
derivatives. We shall do this later. ■ 


30.1.2 Beltrami identity 

Most variational problems have an L which is independent of t. In such a 
case, the Euler-Lagrange equation simplifies considerably. Consider the total 
derivative of L with respect to t : 

dL dL . dL dx 
dt dx dx dt 

Substitute for dL/dx from Euler-Lagrange equation to obtain 

dL . d dL dL dx df.dL\ d( T . dL\ n 

~di~ x Jt~di + ^~dt~Jty : ^±)' 01 Jt\ x d±)~ ' 

This gives the Beltrami identity: 

dr 

L — x—— = C. (30.11) 

dx 

Example 30.1.3. The Brachistochrone Problem A bead slides on friction¬ 
less bars of various shapes due to gravity. What shape gives the shortest time? 
This is the famous brachistochrone problem which started the calculus of variations. 
Specifically, consider various paths connecting P a = ( x a ,y a ) and Pb = ( Xb,yb ) with 
Pb < ya- A mass m starts from rest at P a and moves on a frictionless path from P a 
to Pb- Find the equation of the path that yields the shortest time. 
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For each element ds of the path, the time of travel is dt = ds/v, where v is the 
speed at ds. If ds is located at height y above the ground, then conservation of 
energy gives 

mgy a = \mv 2 + mgy or v = ^2g(y a -y). 

Therefore, 


Pb ds _ f Pb y Ux 2 + dy 2 _ f x ” I 1 + y' 2 
’a V ~ JPa VMya ~ V ) _ J X a V MVa ~ V) 


and L(y, y') = yj {1 + y' 2 )/[2g(y a — y)]. Since L is independent of x, we can use the 
Beltrami identity: 


l+y ,: 


, d 


2g(ya - y ) dy 1 V 2 g(y a - y) 
, d 


1 + y 1 - 


= C, 


\Jl + y a - y'—sjl + y ,2 =Cy/2g(y a - y). 

Differentiating and simplifying the left-hand side gives 

1 


= C^2g{y a - y). 


\n + y* 

Scjuare both sides, introduce a new constant, and solve for y' to get 


dy 

dx 



- 1 . 


The substitution u = k/(y a — y) give dy = (k/u 2 )du and changes the differential 
equation to 


k du 
u 2 dx 


= vw — T 


du 


2 07^1 


= — dx. 


Integrating both sides—and using an integral table—yields 


x vw-1 —i , / -—\ „ 

— =-htan (vm— l) +C. 

k u 

As y —> y a , u —> oo and x —> x a - Therefore, C = x a /k — tt/2, and the solution 
becomes 

x_xa = Vu£ T + tan _! _ 1, ( 30 . 12 ) 

k u 2 

Let tan -1 (\/u — 1) = ip. Then y/u — 1 = tany and 

u = 1 + tan 2 ip = sec 2 ip or y = y a — k cos 2 tp. (30.13) 


Substituting u in terms of p in (30.12) yields 


x — Xa 7T 

—-- = sm ip cos ip + ip — —. 

k 2 

Finally defining 6 = 2ip — n, this ecjuation and (30.13) give x and y in terms of the 
parameter 6: 

k k 

x - x a = - (6 - sin<9), y - y a = -- (1 - cos 6) . 

This is the parametric equation of a cycloid. ■ 
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Example 30.1.4. The Soap Film Problem When a film of soap is stretched 
across a frame, the surface tension causes the area to be a minimum. The film of 
Figure 30.3 is an area of revolution with an element of area shown. This element of 
area is 2nyy/dx 2 + dy 2 . Therefore, we have to extremize the functional 

r h _ 

L[y]=27r/ yi/l + y' 2 dx, y(0) = a, y(h) = b. 

Jo 


Since L(x,y,y') = y\Jl + y' 2 is independent of x, we can use the Beltrami identity 
and get 

vV 1 + y ' 2 - (yV 1 + v' 2 ) = Ci- 

This yields 

V = Ciyjl + y’ 2 or y = y/{y/Ci) 2 - 1. 

Let u = y/Ci to simplify this equation to 

C\u = J ( u 2 — 1 or Ci — = dx, 

VW 1 T 

which can be easily integrated to give 

x = Ci In + \Ju 2 — 1^ + C 2 or u + \Ju 2 — 1 = e c 'i = e , 

where v is the exponent of the exponential. From this, we get 

_ V 1 ~ —V 

/ 0 -1 v 2-i 2 u n v 1 2 G ~r G . 

yti - l — e — u or u — 1 = e — lue + u or u =--- = cosh v. 

Returning to y and x, we obtain 

zl = cosh (^rO or y = cicosh(^p). 

The constants C 1 and C 2 can be found by the conditions y(0) = a, y(h) = b. g 



Figure 30.3: The soap film attaches itself to the two rings in such a way that the area 
obtained is minimum. 
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30.1.3 Several Dependent Variables 

The path of (30.2) had only one dependent variable. One can consider paths in 
an TO-dimensional space where L depends on {a: a (t)} x and their derivatives. 
Such a generalization is straightforward: In (30.4) instead of x(t), we have 
x a (r), which changes (30.5) to 


8 

8x a (r) 


m 


f}=i 


dL 8xp(t) 
dx /3 6x a (r) 


dL 5xp(t) 
dip 8x a (r)\ 1 


where x = (x\,x- 2 ,... ,x m ). For this we need the equivalent of (30.6) and 
(30.7) which are easily shown to be 


Sxp(t) x Sxp(t) d 

— = Sc , e S(t-T), — = - T). 

Substituting this in the above sum yields 

8 r , , . . . dL . dL d . 

L(x(t),x(t)) = —S{t-T) + -^ 7 -—8(t-r), 


(30.14) 


8x a (r) 


din, dt 


which replaces the x and i of (30.8) with x a and i a . We thus obtain the 
multivariable version of the Euler-Lagrange equation: 


dL 

dx a 


d dL 
dt di a 


= 0, 


a = 1,2,... ,m. 


(30.15) 


30.1.4 Several Independent Variables 

Equation (30.15) is one generalization of the Euler-Lagrange equation. It still 
corresponds to a path, a (generally) curved line, albeit in a multi-dimensional 
space. There is another generalization which is also important: going from a 
path to a surface. In this case, our dependent variable is a function of several 
independent variables. So, consider a function (f> of m variables which we 
collectively denote by x, and instead of (30.2) consider the functional 

L[4>] = J d m x£j((j); 0,i, 0 , 2 , • • •, 4>,m] x), (30.16) 

where denotes the derivative of <j> with respect to x a , and LI is some region 
in the m-dimensional space. Note the change in notation: we use L instead 
of L when integration is over a multidimensional “volume.” The variational 
derivative (30.5) now becomes 


8 

y) 


£'(<& 0,1 s 0,2, • • 


d'.m ■ x) 


dL 84>(x) ^ dL 5<j) }a 

d(j) 84>(y) + ^ d(j> tCI <50(y)' 


(30.17) 
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More fundamental 

functional 

derivatives 


Substituting these in the functional derivative of the integral (30.16) and 
setting the result equal to zero yields another Euler-Lagrange equation: 


Furthermore, (30.6) and (30.7) generalize to 




= 6(x-y), 


S<Mx) d 


(30.18) 


dL 

d(j> 


E 


d dL 


dx a d(j> tC 


= 0 . 


(30.19) 


r y AT 

Finally if we have several dependent variables , collectively repre¬ 
sented by 4>, and several independent variables collectively repre¬ 

sented by x, then the variational functional becomes 


L[$] =£ d m xL(f>: $.r, $ 2 , • • -, $ m ; x), (30.20) 

with the variational derivatives 


£ 

.* 

and 



dtL Sftfx.) 
d<f>> S(f> i ( y) 


^ ^ S^ a 

hvhl dtfaWb)’ 

(30.21) 


5<^'(x) 

sftiy) 


8ijS(x-y), 


<ty?q( X ) 

<^(y) 


= < «aZ <(x 


y). 


(30.22) 


Substitution of these in (30.20) leads to the Euler-Lagrange equations 


... and more 
fundamental 
functional 
derivatives 


dZ 

d(f> 1 


E 


d dL 

dx a d(j)\ a 


= 0, 


i = 1,2,..., N. 


(30.23) 


In many situations, the variational problem consists of various parts each 
having one or several dependent or independent variables. 


30.1.5 Second Variation 

Euler-Lagrange equation was obtained by setting the first variational deriva¬ 
tive (30.8) equal to zero. As in the multivariable calculus, this only finds 
the extremum. And just as in the multivariable calculus, to see if we have a 
minimum or a maximum, we have to run the second derivative test. 

The easiest way to apply the second derivative test in calculus is to consider 
the Taylor expansion of the function. And since we are interested in local 
minima and maxima, we ignore the third and higher orders in the Taylor 
expansion. Now recall from Section 10.7 that the Taylor series of a function 
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r u N 

f of N independent variables [x 2 ) i=1 = x up to the second order around xo 
is 

N i N 

/( x ) =/( x o)+X]( a: * _a: o*)/.i( x o)+2 EZ ( x i~ x oi)( x j- x oj)f,ij(xo), (30.24) 
i= 1 i,j = 1 


where 

f = df 

J,i — o and 

OXi 

If Xo is an extremum of /, then /,(xq) 
1 * 

/( x ) - /( x o) = ^ E Xoi){xj 

i,j—1 


, = 

,y dxtdxj 

= 0 and the above equation becomes 
- x oj)f,ij( x o) = <52/( x o), (30.25) 


where we introduced the abbreviation (5 2 /(xo)—the second variation of / at 
Xo—for the sum. The test for maximum or minimum of / can now be stated: 
If for any x that is close enough to Xo, the second variation ^/(xo) is positive, 
then Xo is a minimum point, and if (5 2 /( x o) is negative, then xo is a maximum 
point. 

The generalization to the variational problem follows from our usual pas¬ 
sage from the discrete to the continuous. For the most general integral (30.20), 
the second variation is 


(5 2 L[3>o] 



</>o( x )) (f(y) - </>o(y)) 


<5 2 L 

5 ft (x.) 8 ft ( y) 


[*o], 


(30.26) 


where the last term means “find the second variational derivative and evaluate 
the result at the solution 3>o of the Euler-Lagrange equation.” For a single 
dependent variable and several independent variables this becomes 


<5 2 L[0 O ] = d m xj^d m y[ftx) - <M X 


6 2 L 


Sftx)Sfty) 


[M 


(30.27) 

and for a single independent variable and several dependent variables we get 


™ /*b /*b . ^2 

&M*°] = 2 £ l dt][ *(*<«)-*„(«) fe(t)fa . (T) [*»], 

(30.28) 

and for the simplest case of a single independent variable and a single depen¬ 
dent variable (30.26) reduces to 

^[ x o]=\J dt j dT(x(t)-x 0 (t)') (x(t)-xo(t^) §x ^j~ x ^ No]- (30.29) 
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In the calculation of the second variation, we need to find the variational 
derivatives of second derivatives of dependent variables. It is not hard to show 


that 


<^(y) 


= Si. 


d 2 

dxpdx a 


S(x 


y)- 


(30.30) 


Example 30.1.5. The necessary condition for the straight line to be the shortest 
distance between two given points is that it satisfies the Euler-Lagrange equation 
(30.10). Example 30.1.2 showed that yo(x) = cx + d solves the Euler-Lagrange equa¬ 
tion. To see if this is minimum or not, calculate the second variation (30.29). The 
first derivative is given by (30.8), which with the current symbols for independent 
and dependent variables, becomes 


5L\y\ _ dL _ ddL _ _d_ 

5y(x) djT dx dy' X dx 1^1 + y 12 I (1 + r/ ,2 ) 3/2 ’ 


and 


m 


5y(x')5y{x) Sy(x') 1(1 + y ,2 ) 3/2 




S 2 Lf ; 


5y(x')5y(x) 5y(x l 
Using (30.30), this yields 

5 2 L\y] 


Sy”(x) H . /2\ — 3/2 „ 5 , ,2\ 

= -T7777v( 1 + y ) ~y s^( 1+V ^ 


/ 2 \ — 3/2 


5y(x')Sy(x) (i + y > 2 ) 3 / 2 




S"(x-x') 3y'y”5'{x-x') 


(i + y' 2 ) 


/ 2 ') 3/2 


(i + y' 2 ) 


,2\S/2 


Now we have to evaluate this at the solution yo(x) of the Euler-Lagrange equation 
for which y' 0 = c and y'o = 0. Thus, 


S 2 Lf. 


5y{x')Sy(x) 


[yo] = - 


5"{x - x') 
(1 + c 2 ) 3 / 2 ' 


Substituting this in (30.29) and using the derivative property (5.12) of the Dirac 
delta function yields 

5 2 L[r/o] = — 20 - dx J dx '(v( x ) ~ Vo( x )) {y( x ) ~ yo( x '))s” (x - x 1 ) 

~ 2(1 +^c 2 ) 3 / 2 L dx(y(x)- yo (x))^(y(x)-y 0 (x)). 


Fundamental 
functional 
derivatives 
involving second 
partial derivatives 


The last integral can be integrated by parts to give 
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Isoperimetric 

problem 


Therefore, 

62L[Vo] = 2(1 + <?)*/> J a dX {l ( y{x) V ° {X) ) } ’ 

which is a manifestly positive quantity for any y(x). Hence, yo(x) = cx + d does 
indeed minimize the distance between any two given points. ■ 

We should emphasize that although the calculation of the second varia¬ 
tional derivative is rather straightforward, showing that the second variation 
(I 2 L—the integral of the second variational derivative as given in Equations 
(30.26) to (30.29)—is positive or negative is by no means trivial. Example 
30.1.5 is one of those rare cases where the calculation of 82 L is manageable. 


30.1.6 Variational Problems with Constraints 


The variational problems treated so far have been problems with boundary 
conditions, namely that all “paths,” or extremal candidates, must go through 
the same boundary. In many applications, there are other auxiliary conditions 
or constraints that the extremal candidates must obey. A typical example is 
the problem of finding the closed curve of the largest area when the perimeter 
is a given fixed length. The most elegant way of treating the constrained 
variational problems is via Lagrange multipliers discussed in Section 12.3.1. 

Suppose that we are looking for a curve that not only extremizes L[x] of 
(30.2), but also is such that another functional, 

r b 

K[x] = / G(x(f),x(f),i) dt, (30.31) 


takes a fixed value l. Such a problem is called isoperimetric. In exact 
analogy with the multivariable calculus, we form a new function L + AG and 
extremize that function. This means that we have to solve the Euler-Lagrange 
equation 


dL_±dL yf—— ^1-0 
dx dt dx \ dx dt dx J 


(30.32) 


Example 30.1.6. As an example of the isoperimetric variational problem, con¬ 
sider all curves of length l in the upper half plane passing through the points (—a, 0) 
and (a, 0). What is the equation of the curve that together with the interval [—a, a] 
encloses the largest area? The sought-after function y(x) must extremize 

L [y] = f ydx, 


subject to the condition that 

y(-a) =0 = y(a), 


K [y] 



+ y ' 2 dx = l. 


Equation (30.32) with L — y and G = y/ 1 + y ' 2 gives 


1 +A 


d y' 

dx V 1 +v ' 2 


= 0 . 
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Integrating this yields 

x + X 

which can be solved for y' to give 

V = ± 

whose solution is 


y 


= C U 


s/T+V* 

Ci-x 

\/X 2 — (Ci — x) 2 ’ 


y = ±y/X 2 -(C 1 -x) 2 + C 2 , 
or 

(x — Ci) 2 + (y — C 2) 2 = A 2 . 

This is a circle of radius A. The values of the three unknowns C 1 , C 2 , and A are 
determined from the conditions 


y{~a) = 0 = y(a), 


KM = l. 


There is another type of variational problem with constraint applicable 
to the case of one independent and several dependent variables, in which the 
constraint is given by an equation of the form 

s(x(i),x(t), t) = 0 

This is called the finite constraint problem and is similar to Equation 
(12.31) where the discrete index j has been replaced with the continuous index 
t. Thus, the Lagrange multipliers A j should be replaced with A(f) and the 
sum in (12.32) replaced with an integral over t, which is already present in the 
extremal problem. Therefore, the problem changes to finding the extremum 
of 


/ |£(x(t), x(t), t) + A(t)fif(x(t), x(f), t)} dt , 

J a 

and the Euler-Lagrange equation becomes 


(30.33) 


dL d dL 


dg d dg 


dX dg 


dxi dtdxi +X \dxi dtdxi) dtdxi °’ * 1} 2 ’ ‘'' ’ N ‘ (3 °' 34) 

If there are multiple constraint equations, 

0 o,(x(f),x(t),t) = 0 , a = 1 , 2 , 

then there will be m Lagrange multipliers and a sum over a in (30.33), 

rb 


L(x(f),x(f),f) +^2X a (t)g a (x(t),x(t),t) \ dt, 


(30.35) 


leading to the following Euler-Lagrange equation: 

dX a dg 


dL 

dxi 


d dL 
dt dxi 




a—1 


dg a 

dxi 


d dg a 
dt dii 


dt dxi 


= 0, * = 1,2, 

(30.36) 


Finite constraint 
problem 
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Example 30.1.7. Among all curves lying on the sphere centered at the origin 
and of radius a and passing through two points (* 1 , 3 / 1 , zi) and (* 2 , 2 / 2 , 22 ), find the 
shortest one. This is a finite constraint problem with 

fX2 _ 

\-[y,z\ = / i/l + y' 2 + z' 2 dx 

J X\ 


and 

t \ 2 , 2.2 2 

g{x,y,z) =x +y +z -a . 

The solution is the set of functions {y(x), z(x)} which extremize the integral 



1 + V' 2 + z' 2 + X(x)(x 2 + y 2 + z 2 



dx, 


i.e., functions that satisfy the Euler-Lagrange equations 

d y' 


2yX(x) - 
2 z\(x) — 


dx y/TTy” + z' 2 
d z' 

dx y/l +y'2 + z '2 


= 0 , 

= 0 . 


Solving these equations, we get the solutions in terms of four constants which can 
be determined from the boundary conditions 


y(xi) = 2 / 1 , y(x 2 ) = V 2 , 

z(xi) = Zl, 2 ( 2 : 2 ) =22- 


30.2 Lagrangian Dynamics 

Variational calculus has become an indispensable tool in physics. Almost all 
(partial) differential equations of physics can be derived from some variational 
problem. Furthermore, symmetry considerations, which are the cornerstones 
of modern fundamental physics, find their natural settings in functionals. 
And a very elegant and powerful formulation of quantum mechanics done by 
Richard Feynman uses the variational techniques. 


30.2.1 From Newton to Lagrange 

For most conservative systems one can define functionals whose extremization 
leads to differential equations of motion of those systems. The second law of 
motion for a particle acted on by a conservative force can be written as 


dv 

—V<!> = to— or 

dt 


d<& dii 
= m ~dt 


or 


d d 

( _«) _ - (mXi) = 0 . 


(30.37) 

This looks very much like (30.15)! Let’s see if we can construct an L that leads 
to the equations of mechanics. Use x, y, and 2 for the moment with n = 3. 
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By equating the first term of (30.15) to the first term of the last equation in 
(30.37), we get 

dL d 
dx~ 

Antidifferentiation yields L = — &(x,y,z) + f(y,z,x,y,z), where / is the 
“constant” of integration. If the partials of L with respect to y and z are to 
be equal to the corresponding partials of — d>, then / cannot depend on y and 
z. So, / is a function of velocity components. If the second term of (30.37) is 
to equal the second term of (30.15), then 

• dL d f tr • q i • 2 , 

mx=—=— or }(x,y,z) = 1 mx +g{y,z ), 

where g(y, z) is the “constant” of this new integration. Applying the same 
argument to y and z, we conclude that / is just the kinetic energy. Therefore, 
we arrive at the important conclusion that for a single particle with position 
vector r, the extremization of 

L{ r, r, t) = — d>(r) + \m |r| 2 = -4>(a:, y, z) + \m (x 2 + y 2 + z 2 ) (30.38) 

gives the equation of motion of the particle. L( r, r, t) is called the Lagrangian 
of a single particle moving in potential d>. 

For N non-interacting particles in an external potential, the Lagrangian 
is the sum of the single-particle Lagrangians: 

N N N 

L = J2 Li = Y^ (r® 1 + 5 mi N 2 ) = + \ mi ( ±2 i + y 2 i + ^ )] > 

i —1 i—1 i= 1 

where dJj = <I>(xi, yi. Zi). Note that this can be written as 


N N 

L = KE — d>, where KE = ^ im, (x 2 + y 2 + z 2 ) , and <F = ^ <Fj. 

i—l i—1 

(30.39) 

If the particles are interacting, then <I> is no longer the sum of individual 
potentials, but a general function of all coordinates. It is therefore common to 
collect all the N triple coordinates into one big 3iV-component vector q and 
call it the generalized coordinates vector. Then the Lagrangian is written 
as 

3 N 

L (q, q, t) = KE - $ = ^ - $(?i, 92, • • •, 93 n)- (30.40) 

i—1 

We changed the mass to /q to avoid confusion with the m, of the previous 
equation. For example, for three particles interacting gravitationally, 


$(qi,q 2 ,...,q 9 ) = $(ri, r 2 , r 3 ) = - 


Gmim 2 GtoiW3 Gto 2 W3 


Id -d 


Id -d 


Id -r 3 | 


which can be written in terms of the q’s, once the latter are defined in terms 
of the position vectors. Note that many of the /q’s in (30.40) are equal. For 
instance, if q\ = xi, qi = yi, and q 3 = z±, then /q = /t 2 = y 3 = mi, etc. 
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Figure 30.4: The inclined plane moves as m moves on it. 


Example 30.2.1. A block of mass m slides on a frictionless inclined plane, which 
has mass M and moves on a frictionless horizontal surface as shown in Figure 30.4. 
The position of the incline is denoted by A and that of the block by r, or ( x , y ) with 

x = X + r cos 9, y = (l — r) sin 9, 


where l is the length of the inclined plane. The kinetic energy of the system is 


KE = \MX 2 + ±m 
= ±MA 2 + i m 
= \MX 2 + §m 


(i 2 +2/ 2 ) 

^X + r cos 9 j + r 2 sin 2 9 
(A 2 + r 2 + 2Arcos6>) , 


and the potential energy 

$ = mgy = mg(l — r) sin 9, 
giving rise to the Lagrangian 

L = | MX 2 + ^A 2 + r 2 + 2.Yr cos 6 ^j — mg(l — r) sin 9. 


The equations of motion 


9I L _±(dL\_r ) 

dX dt\dXj -°’ dr dt\dr) “ 


can now be calculated: 

—MX — m + r cos 9 j = 0, mg sin 9 — m yf + X cos 6 ^j = 0. 

Solving for the two accelerations, we get 


—mg sin 9 cos 9 
M + m sin 2 9 


(m + M)g sin 9 
M + m sin 2 9 


Note that for an infinitely heavy inclined plane, A = 0 and f = grsin(9, as 
expected. ■ 


was born Giuseppe Luigi Lagrangia but adopted the French version of his name. He 
was the eldest of eleven children, most of whom did not reach adulthood. His father 
destined him for the law—a profession that one of his brothers later pursued— 
and Lagrange offered no objections. But having begun the study of physics and 
geometry, he quickly became aware of his talents and henceforth devoted himself to 
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the exact sciences. Attracted first by geometry, at the age of seventeen he turned 
to analysis, then a rapidly developing field. 

In 1755, in a letter to the geometer Giulio da Fagnano, Lagrange speaks of 
one of Euler’s papers published at Lausanne and Geneva in 1744. The same letter 
shows that as early as the end of 1754 Lagrange had found interesting results in 
this area, which was to become the calculus of variations (a term coined by Euler 
in 1766). In the same year, Lagrange sent Euler a summary, written in Latin, of 
the purely analytical method that he used for this type of problem. Euler replied 
to Lagrange that he was very interested in the technique. Lagrange’s merit was 
likewise recognized in Turin; and he was named, by a royal decree, professor at the 
Royal Artillery School with an annual salary of 250 crowns—a sum never increased 
in all the years he remained in his native country. Many years later, in a letter 
to d 'Alembert, Lagrange confirmed that this method of maxima and minima was 
the first fruit of his studies—he was only nineteen when he devised it—and that he 
regarded it as his best work in mathematics. 

In 1756, in a letter to Euler that has been lost, Lagrange, applying the calculus of 
variations to mechanics, generalized Euler’s earlier work on the trajectory described 
by a material point subject to the influence of central forces to an arbitrary system 
of bodies, and derived from it a procedure for solving all the problems of dynamics. 

In 1757 some young Turin scientists, among them Lagrange, founded a scientific 
society that was the origin of the Royal Academy of Sciences of Turin. One of the 
main goals of this society was the publication of a miscellany in French and Latin, 
Miscellanea Taurinensia ou Melanges de Turin, to which Lagrange contributed fun¬ 
damentally. These contributions included works on the calculus of variations, prob¬ 
ability, vibrating strings, and the principle of least action. 

To enter a competition for a prize, in 1763 Lagrange sent to the Paris Academy 
of Sciences a memoir in which he provided a satisfactory explanation of the trans¬ 
lational motion of the moon. In the meantime, the Marquis Caraccioli, ambassador 
from the kingdom of Naples to the court of Turin, was transferred by his government 
to London. He took along the young Lagrange, who until then seems never to have 
left the immediate vicinity of Turin. Lagrange was warmly received in Paris, where 
he had been preceded by his memoir on lunar libration. He may perhaps have been 
treated too well in the Paris scientific community, where austerity was not a leading 
virtue. Being of a delicate constitution, Lagrange fell ill and had to interrupt his 
trip. In the spring of 1765 Lagrange returned to Turin by way of Geneva. 

In the autumn of 1765 d 'Alembert, who was on excellent terms with Frederick II 
of Prussia, and familiar with Lagrange’s work through Melanges de Turin, suggested 
to Lagrange that he accept the vacant position in Berlin created by Euler’s departure 
for St. Petersburg. It seems quite likely that Lagrange would gladly have remained 
in Turin had the court of Turin been willing to improve his material and scientific 
situation. On 26 April, d 'Alembert transmitted to Lagrange the very precise and 
advantageous propositions of the king of Prussia. Lagrange accepted the proposals 
of the Prussian king and, not without difficulties, obtained his leave through the 
intercession of Frederick II with the king of Sardinia. Eleven months after his arrival 
in Berlin, Lagrange married his cousin Vittoria Conti who died in 1783 after a long 
illness. With the death of Frederick II in August 1786 he also lost his strongest 
support in Berlin. Advised of the situation, the princes of Italy zealously competed 
in attracting him to their courts. In the meantime the French government decided 
to bring Lagrange to Paris through an advantageous offer. Of all the candidates, 
Paris was victorious. 



Joseph Louis 
Lagrange 
1736-1813 
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Lagrange left Berlin on 18 May 1787 to become pensionnaire veteran of the Paris 
Academy of Sciences, of which he had been a foreign associate member since 1772. 
Warmly welcomed in Paris, he experienced a certain lassitude and did not imme¬ 
diately resume his research. Yet he astonished those around him by his extensive 
knowledge of metaphysics, history, religion, linguistics, medicine, and botany. 

In 1792 Lagrange married the daughter of his colleague at the Academy, the 
astronomer Pierre Charles Le Monnier. This was a troubled period, about a year 
after the flight of the king and his arrest at Varennes. Nevertheless, on 3 June the 
royal family signed the marriage contract “as a sign of its agreement to the union.” 
Lagrange had no children from this second marriage, which, like the first, was a 
happy one. 

When the academy was suppressed in 1793, many noted scientists, including 
Lavoisier, Laplace, and Coulomb were purged from its membership; but Lagrange 
remained as its chairman. For the next ten years, Lagrange survived the turmoil of 
the aftermath of the French Revolution, but by March of 1813, he became seriously 
ill. He died on the morning of 11 April 1813, and three days later his body was 
carried to the Pantheon. The funeral oration was given by Laplace in the name of 
the Senate. 


30.2.2 Lagrangian Densities 

Particles are localized objects (indeed mathematical points), whose trajecto¬ 
ries, determined by ordinary differential equations, describe curves in space. 
A Lagrangian of the form (30.40), with one independent variable (time), is 
therefore appropriate for particles. 

Most of physical quantities, however, are not particles, but fields, which 
are not localized. In order to apply the variational techniques to fields, one has 
to consider a Lagrangian density £, whose integral over some volume gives 
the Lagrangian, which can now be integrated over time as in (30.2). Thus, 
in field theories, the integration is over the 4-dimensional spacetime, a nat¬ 
ural setting for relativity—which is very relevant because most field theories 
are relativistic—to operate. A physical field usually has several components, 
making Equation (30.23) relevant to the situation. 


Electrodynamics Lagrangian 

Section 17.3.2 derived the electromagnetic field tensor F a p and wrote the four 
Maxwell’s equations in terms of it. Since F a p seems to be so fundamental, 
and the variational techniques seem to yield the (partial) differential equations 
of physics, there may be a chance that electrodynamics can be described by 
a Lagrangian density. In the language of tensors, a Lagrangian density is a 
scalar. Thus, we have to construct a scalar out of F a 0 . The simplest such 
scalar is F a PF a p. Equation (17.47) showed that the field tensor can be written 
as derivatives of the 4-potential A a , which is therefore more “fundamental” 
than F a p. There is another 4-vector appearing in Maxwell’s equations, namely 
the 4-current J a . Thus, by taking the dot product J a A a , we form another 
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scalar. We therefore write 


L = aF^F a p + bJ a A a = ar 1 a ' i rf v F liV F afi + bJ a A a , 

where a and b are to be determined later. Writing the field tensor in terms of 
the 4-potential, we get 


£ = ar] ap r] l5v {d p A v - d v A,j){d a Ap - dpA a ) + bJ a A 0 
= arj^rf 1 '(A v ^ - A^ v )(Ap tCt - A a>0 ) + bJ a A a . 

The Euler-Lagrange equation for A a can be written as 

d£j d dL 


dA<j dxP dA a 


= 0 . 


(30.41) 


(30.42) 


The first term is easy to calculate: 


d£j = bJ a ^ = bJ a 5Z = bJ a . 


dA a dA a 

The second term is only slightly more complicated once we realize that 

dA 0,g to sp 

dK, P p a ' 

With this in mind, the second term of (30.42) can be shown to be 
d dL 


dxP dA. 


= 4a3 p ( d p A a - d a A p ) = 4 ad p F pa , 


<7,p 


and (30.42) becomes 


Aad p F prT = bJ G or 4 ad p F pa = bJ a . 

This becomes Maxwell’s first and fourth equations combined [see Equation 
(17.45)] if a = j and b = Mo- Thus the Lagrangian density for electrodynamics 
is 

£ = \rf*i*rf v {A Vili - Ap, v ){A Pta - A ai/3 ) + Mo J a A a . (30.43) 

This, like any other Lagrangian, can be multiplied by a constant without 
affecting the Euler-Lagrange equations. 

Example 30.2.2. Charged Particle in EM Field Problem 30.18 shows that 
the Lagrangian density (30.43) can be written as 

£ = i(|B| 2 -|E| 2 )+ M0 (p4> — J • A), 

with the variational problem 


L "/.‘(/>")* 
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Lagrangian of a 
charged particle in 
EM field 


Now consider a single particle of charge q interacting with an electromagnetic held. 
For such a particle, 

p — qS(r — v') and J = pv = qvS(r — r'), 


and L becomes 

l-b 


r3 t 
X 


L=i [ dt If (| B| 2 — |Ej 2 ) d 3 x 1 + po f dt If (<j<M(r - r') - qv ■ A<5(r - r')) d 

1 J a -Uo. J a Jin 

= ^/ dtJ(\B\ 2 -\E\ 2 )d 3 x' + p 0 qJ dt {4>(r, t) - v • A(r, t)} . 

The particle also has kinetic energy, which needs to be added to this Lagrangian. 
When adding Lagrangians, one has to incorporate the freedom in multiplying La- 
grangians by constants. In the case at hand, the kinetic energy of the particle should 
be added to the negative of the scalar potential energy (recall that L = KE — $). 
To assure this, we have to divide the entire EM Lagrangian by — 1/po and add it to 
the kinetic energy of the particle. Hence the total Lagrangian becomes 

L = ~ 2 po J dt J d B | 2 ~ l E ! 2 ) + J {lH v | 2 ~ + <7 V ■ A ( r ,f)} • 

Notice how the first integral is four-dimensional while the second integral is over a 
single variable. 

We are interested in the motion of the particle. Therefore, the first integral 
is just a constant (independent of the coordinates and velocity components of the 
particle) and can be dropped. Thus, substituting r for v, we have 

f b 

Lpart = / dt {§m|r| 2 - g$(r, t) + qr ■ A(r, t)} , 

J a 

with the Lagrangian 


L( r, r, t) = \m\r | 2 — g4>(r, t) + qr ■ A(r, t). 

Let’s look at the ^-component of the motion: 

dL d dL <9$ . dA d , . . , „ 

717 diTJT 0 or + — + = o, 


(30.44) 


94 - 

mx + q— -1- q 

ox 


dx 

dA x 

dt 


dx dt 

dA 


dx 


= 0 . 


(30.45) 


Now note that 


and 


dA x 8A X dA x . dA x . dA x . 
~dt ~ ~dt + ~d^ X + ~d^ V + ~d^ Z ' 


. 8A . dA x . dA y . dA z 

r ^~ = x ~^~ + y ~^— h z ~^~ ■ 

dx dx dx dx 

Putting these two equations in (30.45) and rearranging, we obtain 



= 0, 


=-E x by (15.31) 


=-B z by (15.31) 


=B y by (15.31) 
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or 

mx — qE x — q ( yB z — zB y ) = 0. (30.46) 

The expression in parentheses is just the ^-component of v x B. Thus, (30.46) is the 
^-component of the Lorentz force law, governing the motion of a charged particle in 
an electromagnetic field. g 


Klein-Gordon Lagrangian 

One of the first attempts at combining the special theory of relativity with 
quantum mechanics was made by Oskar Klein and Walter Gordon. In fact, 
Schrdinger himself started with the relativistic version of his equation, but 
abandoned it because of some difficulty he encountered when applying it to 
hydrogen atom. By the usual substitution 

d 

E^ih—, p —» — ifiV 
at 

in the relativistic equation E 2 / c 2 — p • p = rn 2 c, Klein and Gordon derived 
the equation that now bears their names: 


1 d 2 cj> 
c 2 dt 2 


h 2 


■= 0 , 


which, in units h = 1 = c, becomes 


d 2 <j> 
dt 2 


'\7 2 (t) + m 2 (f) = 0. 


This equation can also be obtained from the Lagrangian density 
£ = r] a(3 (d a </>) {dp<j>) - 


as the reader can verify. 


(30.47) 


30.3 Hamiltonian Dynamics 

The Lagrangian formulation of mechanics treated in the previous section is a 
powerful tool for studying many different dynamical systems and fields. Fur¬ 
thermore, considerations of symmetry, an indispensable technique in the inves¬ 
tigation of fundamental forces, is most adequately handled in the Lagrangian 
language. Once the Lagrangian is known, the Euler-Lagrange equations pro¬ 
vide second-order differential equations to be solved under given boundary (or 
initial) conditions. 

There is another formulation of mechanics, which instead of second-order 
differential equations, yields twice as many first-order DEs. It is called the 
Hamiltonian formulation. We describe only the case of several dependent 
and one independent variables, the other cases being very similar. Let us 
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assume that our dynamical system has n generalized coordinates {qi}™ =1 and 
a Lagrangian L (q, q, t). In the simplest case (30.40), L = KE — <f> where KE 
is a quadratic term in velocities alone and <1> dependent on the coordinates 
alone. In such a case, 

dL _ dKE _ . 

Wi ~ ~W ~ mj ’ 

which is the momentum associated with the jth generalized coordinate. It is 
therefore natural to generalize the concept of momentum as well, write 


dL (q, q, t) 

Pi = - TLT. - 


(30.48) 


and call pj so defined the generalized momentum of the dynamical system. 

The transition from Lagrangian to Hamiltonian formulation, from a strictly 
mathematical standpoint, is to go from the set of variables (q, q, t) to (q, p, t). 
The procedure for making this transition is the Legendre transformation dis¬ 
cussed in Section 2.2.2. To find the variables involved, consider the differential 
of the Lagrangian: 


lr A/SI , OL \ dL , 

dt = L(^ d «. + ay i «) + m dt ' 

and use (30.48) and the Euler-Laggrange equation to rewrite the above as 

ir A , . 3 L 1 

dL = 2_^ tPi dqt + Pi dqi) + — dt. 

i =1 

If we want to switch the independent variable from c/i to pi, then we have to 
define the Hamiltonian as 


H (q, p, t) = Y^PiQi - L (q> q, t). (30.49) 

i =1 


To verify this, we note that 

dH = '^2(q i dp i + pidq i )-dL='^2(q i dp i + pidqi)-^ (Pi dqi + Pi dqi)-—dt. 
i —1 i —1 i— 1 

Note that p t dq t terms cancel and we are left with 

71 ^ ql 

dH = '^2 (<iidpi ~ Pi dqi) - — dt. 
i =1 


On the other hand, 






30.3 Hamiltonian Dynamics 


749 


Comparison of the last two equations gives 
. _ dH OH dL _dH 

Qi r\ 5 Pi r\ ? . r\. J 

opt oq-i ot at 


i= 1,2,.. .,n, 


(30.50) 


which are called Hamilton or canonical equations. Note that instead of n 
second-order DEs, we now have 2 n first order DEs. 

To discover the physical significance of the Hamiltonian, consider the fa¬ 
miliar simple Lagrangian L = KE — <f>, where KE is the usual kinetic energy 
term and <f> is the potential energy which is independent of velocities. Then, 

(30.48) yields Pi = ptiqi and Hamiltonian is the 

total energy. 

n n 

h = = Mi ~ KE +$ = ke + 

i=l i=1 

-2 KE 


So H is the sum of the kinetic and potential energies, i.e., the total energy. 


Example 30.3.1. Hamiltonian of a Charged Particle in EM Field The 

Lagrangian of a charged particle in an electromagnetic field is given by (30.44). 
Let’s find the Hamiltonian of this system. First we need the generalized momentum 
(30.48): 

dL 

Pi = t -— = mxi + qAi or p = mr + nA. (30.51) 

OXi 

This is an important equation in its own right. It says that the momentum of the 
system is not just that of the particle, but that it also includes a contribution from 
the EM field. In particular, that EM field has momentum. 1 
To find the Hamiltonian, compute fi from (30.51): 

P - (/A 

r = -, 

m 


and substitute in the definition of the Hamiltonian (30.49), where, in this case the 
sum is just the dot product: 


H(r,p,t) = p ■ 


P ~ <?A 


= (p - qA) ■ 


1 

-m 

2 

p-qA 


p-qA 


+ q$ - q 


p-qA 


■ A 


11p — qA\ 2 


+ <?$, 


H(r,p,t)= lP gA ( r ’ f )l + g4 > (r, t). (30.52) 

2m 

Thus, in the presence of an electromagnetic field, the Hamiltonian of a particle takes 
the same form as the total energy of a particle in a potential g4>, except that in the 
expression for the KE part, p — qA replaces p. Such a replacement is called the 
minimal coupling and plays a key role in the quantum mechanical treatment of 
charged particles interacting with EM fields. g 


1 This momentum is the source of radiation pressure. 




750 


30.4 Problems 


Calculus of Variations 


30.1. Show that, in Example 30.1.6, Ci =0, A = Ao, and C 2 = ^Ag — a 2 , 
where Aq is the solution of the equation 



30.2. Find the extremal of the functional 

t/2 


f w/ 

L[x, y]= ( x 2 + y 2 + 2 xy) dt 

Jo 


subject to the boundary conditions 

a;(0) = 0, x(n/2) = 1, y(0) = 0, y( 7 t/2) = 1. 


30.3. Find the extremals of the following functionals: 

(a)L [x,y]= / (x 2 + y 2 + xy) dt, (b)L[a:,j/]= / (2xy - 2x 2 + x 2 - y 2 ) dt. 

J a J a 

30.4. Find the extremal of a functional of the form 

rb 


given that 


L [x,y\= / L(x,y)dt, 

J a 

Ld 2 L ( d 2 L A 2 ' n c 


q2t 02 

dx 

30.5. Find the extremal of the functional 

f 1 


L[x] = f (x 2 + t 2 )dt, 

Jo 

subject to the boundary conditions 

x(0) = 0, a;(l) = 0, f x 2 dt = 2. 

Jo 


30.6. Show that the extremization of (30.33) leads to the Euler-Lagrange 
equations (30.34). 

30.7. Among all triangles with a given base line and a fixed perimeter, show 
that the isosceles triangle has the largest area. 

30.8. An airplane with fixed air speed Vo flies for a time T on a closed curve. 
The wind velocity u is constant in magnitude and direction and |u| < Vq- 
What closed curve encloses the largest area? 
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30.9. Among all curves joining a given point (0, b) on the y- axis to a point 
on the x-axis and enclosing a given area S together with the x-axis, find the 
curve which generates the least area when rotated about the x-axis. 

30.10. An Atwood machine consists of two masses mi and m 2 connected by 
a light inextensible cord of length l which passes over a pulley whose radius 
is a and whose moment of inertia is I. Let x denote the distance of mi from 
the top of the pulley. Using Lagrangian methos, show that the acceleration 
of mi is 

g(mi - m 2 ) 

x —— -. 

mi + m 2 + I/a 2 

30.11. Using polar coordinates, write the Lagrangian of a particle of mass m 
moving in a central force field with potential $(r). Show that the equations 
of motion are 

.. ^2 d , 2 /> 1 r* 

mr = mrO -—, — (mr 0} = 0. 

dr dt 

30.12. Using Lagrangian method, find the acceleration of a solid sphere 
rolling without sliding down an inclined plane having an angle 9 with the 
horizontal. 

30.13. Using Lagrangian method, find the acceleration of a solid sphere 
rolling without sliding down a movable wedge of mass M having an angle 
9. The wedge moves on a frictionless horzontal surface. 

30.14. Two blocks of equal mass m are connected by an inextensible cord 
whose linear mass density is /i. One block is placed on a smooth horizontal 
table, the other hangs over the edge of the table. What is the acceleration of 
the system? Use the Lagrangian method. 

30.15. A simple pendulum of length l and mass m oscillates about its point of 
support which is attached to a block of mass M moving without friction along 
a horizontal line lying in the plane of the pendulum. Write the Lagrangian in 
terms of x, the position of M on the horizontal line, and 9 , the angle l makes 
with the vertical. Find the equations of motion of m and M. 

30.16. Find the equation of a curve describing the equilibrium position of 
a uniformly dense heavy flexible inextensible cord of length l fastened at its 
ends. Hint: The Lagrangian is just the potential energy written as an integral. 

30.17. Show that the Lagrangian density (30.43) can be written as 

L = i(|B| 2 -|E| 2 ) +/ z 0 OoS-J-A). 

Hint: See Sections 17.3.1 and 17.3.2 and be careful about possible change of 
sign when raising or lowering indices. 

30.18. Show that the Lagrangian density (30.47) leads to the Klein-Gordon 
equation. 




Chapter 31 


Nonlinear Dynamics 
and Chaos 


A variety of techniques including the Frobenius method of infinite power series 
could solve almost all linear DEs of physical interest. However, some very fun¬ 
damental questions such as the stability of the solar system led to DEs that 
were not linear, and for such DEs no analytic (including series representation) 
solution existed. In the 1890s, Henri Poincare, the great French mathemati¬ 
cian, took upon himself the task of gleaning as much information from the 
DEs describing the whole solar system as was possible. The result was the 
invention of one of the most powerful branches of mathematics (topology) and 
the realization that the qualitative analysis of (nonlinear) DEs could be very 
useful. 

One of the discoveries made by Poincare, which much later became the 
cornerstone of many developments, was that 


Box 31.0.1. Unlike the linear DEs, nonlinear DEs may be very sensitive 
to the initial conditions. 


In other words, if a nonlinear system starts from some initial conditions and 
develops into a certain final configuration, then starting it with slightly dif¬ 
ferent initial conditions may cause the system to develop into a final config¬ 
uration completely different from the first one. This is in complete contrast 
to the linear DEs where two nearby initial conditions lead to nearby final 
configurations. 

In general, the initial conditions are not known with infinite accuracy. 
Therefore, the final states of a nonlinear dynamical system may exhibit an 
indeterministic behavior resulting from the initial (small) uncertainties. This 
is what has come to be known as chaos. The reader should note that the inde¬ 
terminism discussed here has nothing to do with the quantum indeterminism. 


chaos due to 
uncertainty in 
initial conditions 
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All equations here are completely deterministic. It is the divergence of the 
initially nearby—and completely deterministic—trajectories that results in 
unpredictable final states. 

There are two general categories exhibiting chaotic behavior: systems 
obeying iterated maps and systems obeying DEs. We shall study the first 
category in some detail, and only outline some of the general features of the 
much more complicated category of systems obeying DEs. 


31.1 Systems Obeying Iterated Maps 

Consider the population of a species in consecutive years if the population is 
initially Nq. The simplest relation connecting N\, the population after one 
year, to N 0 is 

Ni = aN 0 , 

where a is a positive number depending on the environment in which the 
species lives. Under the most favorable conditions, a is a large number, indi¬ 
cating rapid growth of population. Under less favorable conditions, a will be 
small. And if the environment happens to be hostile, then a will be smaller 
than one, indicating a decline in population. 

The above equation is unrealistic because we know that if a > 1 and the 
population grows excessively, there will not be enough food to support the 
species. So, there must be a mechanism to suppress the growth. A more 
realistic equation should have a suppressive term which is small for small No 
and grows for larger values of No . A possible term having such properties is 
one proportional to Nq. This leads to 

Ni = aN 0 — /3 Nq where 0 < (3 <C a. 

The minus sign causes the second term to decrease the population. Iterating 
this equation, we can find the population in the second, third, and subsequent 
years: 

N 2 = aN! -0N?, N 3 = aN 2 - /3N%,..., 

and, in general, 

N k+1 = aN k -(3Nl (31.1) 

It is customary to rewrite (31.1) in a slightly different form. First we note 
that since population cannot be negative, there exists a maximum number 
beyond which the population cannot grow. In order for N k+ 1 to be positive, 
we must have 

aN k — f3N k >0 => N k < — 

for all k. It follows that iV max = a//3. Dividing (31.1) by 7V max yields 


%k +1 — OtHki. 1 *£fc); 


(31.2) 
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where Xk is the fraction of the maximum population of the species after k 
years, and therefore, its value must lie between zero and one. Any equation 
of the form 

*£fc+1 — /a(*£fc)> (31.3) 

where a is—as in the case of the logistic map—a control parameter, and in 
which a value of some (discrete) quantity at fc +1 is given in terms of its value 
at k, is called an iterated map, and the function f a is called the iterated 
map function. The particular function in (31.2) is called the logistic map 
function. 

Starting from an initial value xq, one can generate a sequence of x values 
by consecutively substituting in the RHS of (31.3). This sequence is called a 
trajectory or orbit of the iterated map. 

31.1.1 Stable and Unstable Fixed Points 

It is clear that the first few points of an orbit depend on the starting point. 
What may not be so clear is that, for a given a, the eventual behavior of 
the orbit is fairly insensitive to the starting point. There are, however, some 
starting points which are manifestly different from others. For example, in the 
logistic map, if xq = 0, no other point will be produced by iteration because 
fa (0) = 0 or f a (x o) = xo, and further application of f a will not produce any 
new values of x. In general, a point x a which has the property that 

f a ( x a ) = x a (31-4) 

is called a fixed point of the iterated map associated with a. For the logistic 
map we have 

x a = ax a (l — x a ) => x a (l — a + ax a ) = 0 => x a = 0, 1-. (31.5) 

a 

Since 0 < x a < 1, there is only one fixed point (i.e., x = 0) for a < 1, and 
two fixed points (i.e., x = 0 and x = 1 — 1 /a) for a > 1. 

What is the significance of fixed points? When a < 1, Equation (31.2) 
shows—since both Xk and 1 — Xk are at most one—that the population keeps 
decreasing until it vanishes completely. And this is independent of the initial 
value of x. It is instructive to show this pictorially. Figure 31.1(a) shows the 
logistic map function with a = 0.5. Start at any point xo on the horizontal 
axis; draw a vertical line to intersect the logistic map function at f(x o) = aq; 
from the intersection draw a horizontal line to intersect the line y = x at y\ = 
aq; draw a vertical line to intersect the logistic map function at f(x i) = aq; 
continue to find aq and the rest of x’s. The diagram shows that the a:’s are 
getting smaller and smaller. 

What happens when a > 1? Figure 31.1(b) shows the logistic map function 
with a = 2. We note that the orbit is attracted to the fixed point at x = 
0.5. We also note that the fixed point at x = 0 has now turned into a 


iterated map, 
iterated map 
function, and 
logistic map 
function 


fixed point of an 
iterated map 


graphical way of 
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stability of fixed 
points 


analytic criterion 
for stability of a 
fixed point of an 
iterated map 



Figure 31.1: (a) Regardless of the value of xo, the orbit always ends up at the origin 
when a < 1. (b) Even for a > 1, it appears that the orbit always ends up at some 
attractor regardless of the value of xq. Note that now the origin has become a “repellor.” 


“repellor.” We can treat the behavior of the logistic map at general fixed 
points analytically. 

First let us consider a general (one-dimensional) iterated map as given by 
Equation (31.3). We are seeking the fixed points of (31.3). These points— 
commonly labeled by an asterisk—satisfy 

x* = f a (x*), 

i.e., they are intersections of the curves y = x and y = f a (x) in the cry-plane. 
An important property of fixed points is their stability —or whether they are 
attractors or repellors. To test this property, we Taylor-expand the iterated 
map function around x*. keeping the first two terms: 

(x fe - x*) 


or 

|x fc+ i - x*\ 

I Xk - X*\ 

So, Xfc+i will be farther away from (or closer to) x* than Xk if the absolute 
value of the derivative of the function is greater than one (or less than one). 


df a 


dx 


= \f' a (x*)\- 


Xk +1 = fa(x k ) = fa(x*) + 


dfa 

dx 


(xk — x*) = x* + 


dfa 

dx 


Box 31.1.1. A fixed point x* of an iterated map (31.3) is stable if 
|/'(x*)| < 1 and unstable if |/^(x*)| > 1. 


Example 31.1.1. For the logistic map, f a (x) = ax(l—x) so that ff(x) = a—2ax. 
The fixed points are xi = 0 and X 2 = 1 — 1/a. Therefore, 

fU x i) = f'c (0) = « and f a {xl) = f a (l-l/a) =2-a. (31.6) 













31.1 Systems Obeying Iterated Maps 


757 


It follows that the fixed point at x = 0 is stable (attractive) if a < 1, while for 
this same value of a the fixed point *2 is unstable (repulsive). Thus, for a < 1, all 
trajectories are attracted to the fixed point at x = 0. 

Equation (31.6) also shows that for 1 < a < 3, the other fixed point becomes 
stable while the fixed point at the origin becomes unstable. This is also consistent 
with the behavior of the logistic map depicted in Figure 31.1(b). ■ 

The criterion of Box 31.1.1 can also be stated graphically. Since 1 is the 
slope of the line y = x, and since a fixed point is an intersection of the two 
curves y = x and y = f a (x), the criterion of Box 31.1.1 is a comparison of the 
slope of the tangent to y = f a (x) with the slope of y = x: A fixed point x* 
of an iterated map (31.3) is stable, if the acute angle that the tangent line at 
(a:*, f a (x*)) makes with the a:-axis is smaller than the corresponding angle of 
the line y = x. If this angle is larger, then the fixed point is unstable. This is 
equivalent to the simpler statement: 


Box 31.1.2. A fixed point x* of an iterated map f a {x) is stable (unstable) 
if immediately to the right of x*, the curve y = f a {x) lies below (above) 
the line y = x. 


31.1.2 Bifurcation 

Although the logistic map has no stable fixed points beyond x a = 1 — 1 /a, we 
may ask whether there are points at which the iterated map is “semi-stable.” 
What does this mean? Instead of demanding strict stability or instability, let 
us consider a case in which the map may oscillate between two values. This 
situation is neither completely stable nor completely unstable: Although the 
system moves away from the point in question, it does not leave it forever. 
Suppose that just above the largest value of a stable a, the system starts to 
oscillate between two values of x. This is an example of bifurcation: 


Box 31.1.3. When the development of a system splits into two regions as 
a parameter of the equations of motion of the system increases slightly, we 
say that a bifurcation has occurred and call the splitting of the trajectory 

a period-doubling bifurcation. 


Suppose that there are two “fixed” points x* and x \ between which the 
function oscillates such as the two points illustrated in Figure 31.2(a). These 
fixed points must satisfy 

A = /a(*I), X* - f a (x* 2 ). (31.7) 

To gain further insight into the behavior of the logistic map, we introduce the 
so-called second iterate of f a denoted by fffi and defined by 


graphical criterion 
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bifurcation and 
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Figure 31.2: (a) For a = 3.1, there are clearly two attractors located at x = 0.5580 
and x = 0.7646. (b) For a = 3.99, no attractor seems to exist because the iterations 
do not seem to converge in the diagram. 


/a'0*0 = (31.8) 

From this definition, it is clear that every fixed point of f a is also a fixed 
point of fa^ ■ However, the converse statement is not true. In fact, x\ and x 2 
defined in Equation (31.7) are fixed points of /ifO 

fa ] (x*l) = /a(/a(zD) = /a (* 2 ) = *1, 

fa ] ( X 2) = fa(fa(x 2)) = /a(^) = X* 2 , 

[21 

but not of f a . It now follows that fixed points of fa give information about 

period-doubling bifurcation. 

[ 2 ] 

For the logistic map, fa can be found easily: 

fa ] (x) = fa(fa{x)) = Clf a (x)[ 1 - f a (x)} 

= a[ax( 1 — x)][l — ax(l — a:)] = a 2 x(l — x)(l — ax + ax 2 ) 

= —a 3 x 4 + 2 a 3 x 3 — ( a 2 + a 3 )x 2 + a 2 x. (31.9) 

T2l 

The fixed points of ff J ( x ) are, therefore, determined by the equation 

x = — a 3 x 4 + 2a 3 a: 3 — (a 2 + a 3 )x 2 + a 2 x 

which shows that there are, in general, four fixed points, one at x = 0, and 
three others satisfying the cubic equation 

a 3 x 3 - 2a 3 x 2 + (a 3 + a 2 )x - a 2 + 1 = 0. (31.10) 

We can actually solve this equation because we know that one of its roots is 
Xi(a) = 1 — 1/a, a fixed point of f a { x). The cubic polynomial in Equation 
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(31.10) can thus be factored out as a 3 [x—xi(a)\ times a quadratic polynomial 
whose roots give the remaining solutions to (31.10). The reader may verify 
that these roots are 


x 2 (a) 

%3 («) 


1 + a + \fa?^-2a^?> 
2 a 

1 + a — %/ a 2 — 2a — 3 
2 a 


(31.11) 


These two functions start out at the common value of | when a = 3. Then, as 
a function of a, x 2 (a) monotonically increases and asymptotically approaches 
1; X 3 (a) monotonically decreases and asymptotically approaches 0. 

We are interested in those values of a for which the fixed points are not 
completely unstable. In the present case, this means that the value of the 
iterated map must oscillate between only two values. This will happen only 
if the two points are stable fixed points of fjf ‘. Since by Box 31.1.1 stability 
imposes a condition on the derivative of the function, we need to look at the 
derivative of fa^ ■ 

Using the chain rule, which in its most general form is 




g'(h(x))h'(x) 


we obtain 


dx 


/£%) = [f' a (fa{x))]f' a {x). 


In particular, if x happens to be a fixed point x* of f a , then 

dfP 


dx 


= auxi)f a (x*) = \f a (x*)r 


This shows that if x* is a stable fixed point of f a , then 


1/4001 < i =* [f'a(x*)V < i 


Tx^ 


< 1 


(31.12) 


(31.13) 


and x* is a stable fixed point of fa ^ as well. Furthermore, at the two fixed 
points of fa ^ discussed above, Equation (31.12) yields 


df™ 


dx 

df® 


dx 


= fa(f^)fLK) = f' a (x* 2 )f' a (x*l), 

x 2 

= f«(fa{xl))&(xl) = fUx* 1 )fUx*2). 


(31.14) 


It follows that fa ^ 


has the same derivatives at these two points. 
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nth iterate 


The concept of iteration of f a can be readily generalized. The nth iterate 
of f a is 

fa ] = ■ ■ •)) 

n times 


4-cycle fixed point 


and as in the case of fa ^, the fixed points of f a are also fixed points of /^ and 
the stable fixed points of f a are also stable fixed points of /i”' • The converse 
of neither of these statements is, in general, true. 

The utility of the concept of the nth iterate comes in the analysis of the 
location of bifurcation points. To be specific, let us go back to the logistic map 
and Equation (31.6). The stable points, being characterized by the absolute 
value of the derivative of the map function, occur at x = 0 when 0 < a < 1 
and at x = 1 — 1/a when 1 < a < 3. Within the a-range of stability, the 
derivative of f a ranges between 1 —1 and +1, starting with +1 at a = 1 and 
ending with —1 at a = 3 [see the second equation in (31.6)]. Beyond this 
value of a —which is the parameter at which the 2-cycle fixed point occurs 
and which we now denote by aq— f a has no stable points. Equation (31.13), 

however, shows that the derivative of fa ^ is +1 there. This means that the 
[21 

derivative of ff can decrease down to —1 as a increases beyond aq. In fact, 
what happens as a increases past aq is precisely a repetition of what happened 
to f a between a = 1 and a = 3: The derivative of fa keeps decreasing until 
at a certain value of a denoted by «2 a period-doubling bifurcation occurs 
for fa ■ This corresponds to a 4-cycle fixed point. Thus a 4-cycle fixed point 
x*, as well as the corresponding value of a, is obtained by imposing the two 
requirements 



X* 


(31.15) 


This equation entails an important result. By Equation (31.14), the derivative 
[ 2 ] 

of fa 2 at its two stable points are equal. Therefore, both stable points give 
rise to the same pair of equations (31.15). In particular, 


Box 31.1.4. Any value of a.^ that gives a solution for the first fixed point 
must also give a solution for the second fixed point. In fact, we should ex¬ 
pect two values of x* for every 02 that solves the pair of equations (31.15). 


For the logistic map, Equation (31.15) becomes [see (31.9)] 
x = a 2 x (1 — x )(1 — aix + a 2 x ), 

1 a 1 n 3 *2 1 3\ 1 2 

— 1 — —4(^2^ + 00^2^ — ZyCX 2 ~b OL 2 - 

1 The x = 0 is an exception because we are assuming that a is a positive quantity, 
therefore, /'(0) = a cannot be negative. 
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One can solve these equations and obtain eight possible pairs (x*,a 2 )- The 
only two acceptable real pairs which have a value of a 2 larger than three are 


/4 + V6 ± \/l4 — 4^6 

V r ° 


,1 + V6 


(0.644949 ± 0.204989,3.44949). (31.16) 


In particular, <32 = 1 + V& = 3.44949 is the 4-cycle hxed point depicted in 
Figure 31.3 corresponding to the two x values of approximately 0.85 and 0.44. 

The generalization to 2™-cycles is now clear. One simply constructs the 
2 n tlr iterate of f a and solves the two equations 


fa = x * and 



= - 1 . 


X* 


In practice, these are too complicated to solve analytically, but numerical 
methods are available for their solution. Each solution consists of a pair 
(a:*,a: n ) where x* is the 2 n -cycle hxed point and a n is the corresponding 
control parameter. As in the case of f„\ for each acceptable a n , there are 2 n 
hxed points. 


31.1.3 Onset of Chaos 

Suppose we keep increasing a slowly. It may happen that at a certain value of 
a no finite set of “stable” points exists. A graphical analysis of this situation 
is depicted in Figure 31.2(b) showing that the behavior of the logistic map 
is chaotic. What is the relation between the value of a at which chaos sets 
in (which we denote by a c ) and a n ? Considering the chaotic behavior as 



Figure 31.3: The bifurcation diagram for the logistic map. The behavior of the function 
is analytically very simple for a < 3. For a > 3, the behavior is more complicated. The 
4-cycle fixed points are clearly shown to occur at a « 3.45. From approximately 3.57 
onward, chaotic behavior sets in. 
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the n-th iterative 

Lyapunov 

exponent 


one corresponding to an infinite-cycle fixed point, we conclude that, on a 
bifurcation diagram, chaos will occur at 


CXq — ttoo — lim Q?n- 
n —too 


(31.17) 


The limit in (31.17) is one way of characterizing chaos. A more direct way 
is to look at the trajectories. Two nearby points starting at xq and xo + e will 
be separated after n iterations by a distance 


dn 


f l * ] (xo + e)-fW(xo) 


If this separation grows exponentially, we have a chaotic behavior. We define 

fnl 

Ax 0 J , the nth iterative Lyapunov exponent at xq, by 

d n = d 0 e x *o n = ee<’ n . 


Then 


ft ] (x 0 + e) - ft ] (x 0 ) 


= = — hi 

0 n V e / n 


As e —> 0, the RHS becomes the absolute value of the derivative of the n-th 
iterate at xq- But by the chain rule 


dfa 


dx 


d_ 

dx L 


= f a (Xn- 1) 


/a(/Jr 1] (*)) 
dfj?~ 1] 




df< 


In-1 ] 


dx 


dx 


Using this relation repeatedly, we obtain 


dfa 


dx 


= f'a( x n-l)fa(x n -2) ■ ■ ■ fL( X n-k) 
= f' a ( x n-l)f' a (Xn-2) ■ ■ ■ f a (x 0 ), 


dfi 


[n—k] 


dx 


where in the last step we set k = n and noted that /J^ = f a . It now follows 
that 

Ai n J=^ln[|/'MI \f'a( X l)\ ' ' ' \f'a( X n-l)\\ 

which can also be written as 

1 n— 1 

AW = -^ln|/;(*fc)l- 

u n 

k—0 


(31.18) 
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It is common to define the local Lyapunov exponent as follows: 

1 n— 1 

A.X'0 = lim = lim - ^2ln\f^(x k )\. (31.19) 

k—0 

To characterize the chaotic behavior of systems obeying iterated maps, 
one has to calculate for a sample of trajectory points and then take their 
average. The result is called the Lyapunov exponent for the system. It 
turns out that 


Box 31.1.5. A necessary condition for a system obeying an iterated map 
function / Q (x) to be chaotic for a is for its Lyapunov exponent to be 
positive at a. 


31.2 Systems Obeying DEs 

As a paradigm of a nonlinear dynamical system, we shall study the motion 
of a harmonically driven dissipative pendulum whose angle of oscillation is 
not necessarily small. The equation of motion of such a pendulum, coming 
directly from the second law of motion, is 

df x dx 

m — y- = Eocos(flt) — b— — mg sin 9, (31.20) 

dt dt 

where x is the length (as measured from the equilibrium position) of the arc 
of the circle on which mass m moves (see Figure 31.4). 

The first term on the RHS of Equation (31.20) is the harmonic driving 
force with angular frequency Cl, the second is the dissipative (friction, drag, 
etc.) force, and the last is the gravitational force in the direction of motion. 



local Lyapunov 
exponent 


Lyapunov 

exponent 


Figure 31.4: The displacement x and the gravitational force acting on the pendulum. 
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phase space 
diagram 


phase space 
trajectory 


The minus signs appear because the corresponding forces oppose the motion. 
Since the pendulum is confined to a circle, x and 9 are related via x = 19 , and 
we obtain 


d 2 9 . . uv 

ml —7 T = F 0 cos(S2f) — bl — 
dt 2 K J 


de 

dt 


mg sin0. 


Let us change t to t = Ty/ljg where r is a dimensionless parameter measuring 
time in units of T/(2tt) with T being the period of the small-angle pendulum. 
Then, with the dimensionless constants 



the DE of motion becomes 

d 2 9 d0 ■ n u. 

—tt + 7— + sin 9 = 0o cos(wdt). 
dr dr 

It is customary to write this as 

9 + 70 + sinfl = 0o cos(u>£)i), 


(31.21) 


where now t is the “dimensionless” time, and the dot indicates differentiation 
with respect to this t. 


31.2.1 The Phase Space 

The study of dynamical systems—i.e., systems obeying DEs—is considerably 
more complicated than systems obeying iterated maps. While in the latter 
case we were able to use a fair amount of analytical tools, the discussion of 
the former requires an enormous amount of numerical computation. 

One of the devices that facilitates our understanding of dynamical systems 
is the phase space diagram. The phase space of a dynamical system is a 
Cartesian multidimensional space whose axes consist of positions and mo¬ 
menta of the particles in the system. Instead of momenta the velocities of 
particles are mostly used. Thus a single particle confined to one dimension 
(such as a particle in free fall, a mass attached to a spring, or a pendulum) 
has a two-dimensional phase space corresponding to the particle’s position and 
speed. Two particles moving in a single dimension have a four-dimensional 
phase space corresponding to two positions and two speeds. A single par¬ 
ticle moving in a plane also has a four-dimensional phase space because two 
coordinates are needed to determine the position of the particle, and two com¬ 
ponents to determine its velocity, and a system of N particles in space has a 
6 iV-dimensional phase space. 

A trajectory of a dynamical system is a curve in its phase space corre¬ 
sponding to a possible motion of the system. If we can solve the equations 


2 Recall that T = 2iTyjl/g. So r = t/(T/2ir) is indeed dimensionless. 




31.2 Systems Obeying DEs 


765 


of motion of a dynamical system, we can express all its position and velocity 
variables as a function of time, constituting a parametric equation of a curve 
in phase space. This curve is the trajectory of the dynamical system. 

Let us go back to our pendulum, and consider the simplest situation in 
which there is no driving force, the dissipative effects are turned off, and the 
angle of oscillation is small. Then (31.21) reduces to 9 + 9 = 0, whose most 
general solution is 9 = Acos(t + a) so that 

x\ = 9 = A cos {t + a ), 
d6 

X 2 = ijJ = 9 = — = —Asin(t + a). (31.22) 

This is a one-dimensional system (there is only one coordinate, 9) with a two- 
dimensional phase space. Equation (31.22) is the parametric equation of a 
circle of radius A in the aqa^-plane. Because A is arbitrary (it is, however, 
determined by initial conditions), there are (infinitely) many trajectories for 
this system, some of which are shown in Figure 31.5. 

Let us now make the system a little more complicated by introducing a 
dissipative force, still keeping the angle small. The DE is now 

9 + 7 <? + 9 = 0 

and the general solution for the damped oscillatory case is 


xi = 9(t) = Ae 7t//2 cos(wq t + a) 


_ V4-7 2 

where u> o = ——-- 


with 


X 2 = to = 9 = —Ae 7 */ 2 | ^ cos(u;ot + a) + uq sin (uj 0 t + a) j . 



Figure 31.5: The phase space trajectories of a pendulum undergoing small-angle 
oscillations with no driving or dissipative forces. Different circles correspond to different 
initial conditions. 
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Figure 31.6: The phase space trajectories of a damped pendulum undergoing small- 
angle oscillations with no driving force. Different spirals correspond to different initial 
conditions. The larger shaded region, in time, shrinks to the smaller one. 


The trajectories of this system are not as easily obtainable as the un¬ 
damped linear oscillator discussed above. However, since the two coordinates 
of the phase space are given in terms of the parameter t, we can plot the 
trajectories. Two such trajectories for two different A’s (but the same 7 ) are 
shown in Figure 31.6. 

A new feature of this system is that regardless of where the trajectory 
starts at t = 0, it will terminate at the origin. The analytic reason for this is 
of course the exponential factor in front of both coordinates which will cause 
their decay to zero after a long time . 3 It seems that the origin “attracts” all 
trajectories, and for this reason is called an attractor. 4 

There are other kinds of attractors in nonlinear dynamics theory. For 
example, if trajectories approach an arc of a curve, or an area of a surface, 
then the curve or the area becomes the attractor. Furthermore, for a given 
value of the parameter, there may be more than one attractor for a given 
dynamical system (just as there were more than one fixed point for iterated 
maps); and it may happen that the trajectories approach these attractors only 
for certain initial values of the dynamical variables. The set of initial values 
corresponding to trajectories that are attracted to an attractor is called the 
basin of attraction for that attractor. The set of initial values that lie on 
the border between the basin of attraction of two different attractors is called 
a separatrix. 

31.2.2 Autonomous Systems 

Now we want to consider motion with large angles. The DE is then no longer 
linear. The discussion of (nonlinear) DEs of higher orders is facilitated by 

3 “Long” compared to I/ 7 . 

4 This is what we called a fixed point in our discussion of iterated maps. However, 
because of the existence of a variety of “fixed objects” for dynamical systems, it is more 
common to call these attractors. 
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treating derivatives as independent variables. The defining relations for these 
derivatives as well as the DE itself give a set of first-order DEs. For example, 
the third-order DE 


d 3 x 

= 


d 2 x 
dt 2 


- sin(o;i — kx) 


dx 

dt 


' cos (u)t) 


can be turned into three first-order DEs by setting x = X\ and x = X 2 - Then 
the DE splits into the following three first-order DEs: 


x = X\, X\ = X 2 , 

X 2 = x 4 X 2 + sin(o;i — kx)x\ + e x cos (u>t). 

This is a set of three equations in the three unknowns x, xi, and X 2 , which, 
in principle, can be solved. 

It is desirable to have a so-called autonomous system of first-order DEs. 
These are systems which have no explicit dependence on the independent 
variable (in our case, t). Our equations above clearly form a set of nonau- 
tonomous DEs. The nonautonomous systems can be reduced to autonomous 
ones by a straightforward trick: One simply calls t a new variable. More specif¬ 
ically, in the equation above, let X 3 = cot. Then the nonautonomous equations 
above turn into the following autonomous system: 


X = XI, X\ = X 2 , 

X 2 = x 4 X 2 + sin(a ;3 — kx)x 1 + e x cos X 3 , 
x 3 = to. 

We have had to increase the dimension of our phase space by one, but in 
return, we have obtained an autonomous system of DEs. 

Based on the prescription above, we turn the second-order DE of the driven 
pendulum into a set of first-order DEs. First we rewrite the DE describing a 
general pendulum [see Equation (31.21)] as 

8 + 70 + sin 0 = (f >0 cos a, 

where a is simply tupt. Then turn this equation into the following entirely 
equivalent set of three first-order DEs: 

8 = u, lo — — ")io — sin 8 + <po cos a, a = 0JD- (31.23) 

The two-dimensional ( 8 , ui) phase space has turned into a three-dimensional 
{9,w,a) phase space. But the resulting system is autonomous. 

Just as in the linear case, it is instructive to ignore the damping and driving 
forces first. We set 7 and </> 0 equal to zero in Equation (31.23) and solve the set 
of DEs numerically. For small angles, we expect a simple harmonic motion 


autonomous and 
nonautonomous 
dynamical systems 


5 The solution can be given in terms of elliptic functions as discussed in Chapter 11. 




768 


Nonlinear Dynamics and Chaos 


(SHM). So, with 0(0) = 7r/10 and w(0) = 0, 6 we obtain the plot on the left of 
Figure 31.7. This plot shows a simple trigonometric dependence of angle on 
time. 

The initial angular displacement of the plot on the right of Figure 31.7 is 
approximately n radians corresponding to raising the mass of the pendulum 
all the way to the top. 7 The flattening of curves at the maxima and minima 
of the plot indicates that the pendulum almost stops once it reaches the top 
and momentarily remains motionless there. This is expected physically as 
0(0) = 7r is a location of (unstable) equilibrium, i.e., with w(0) = 0, the 
pendulum can stay at the top forever. So, for 0(0) « 7r, the pendulum is 
expected to stay at the top, not forever, but for a “long” time. 

The phase space diagram of the pendulum can give us much information 
about its behavior. With the initial angular velocity set at zero , the pendulum 
will exhibit a periodic behavior represented by closed loops in the phase space. 
Figure 31.8 shows four such closed loops corresponding—from small to large 



Figure 31.7: The undamped undriven pendulum shows an SHM for small initial angles 
(the plot on the left has a maximum angle of 7r/10). For large angles, the motion is 
periodic but not an SHM. The maximum angle of the plot on the right is slightly less 
than 7T. 



-3-2-10 1 2 3 


Figure 31.8: Phase space diagrams for a pendulum corresponding to different values 
of maximum displacement angles (horizontal axis). The inner diagrams correspond to 
smaller values; the outermost plot has a maximum angle of 179.98 degrees at which the 
angular speed is 1. 

6 It is important to keep iu(0) small, because a large initial angular velocity (even at a 
small initial angle) can cause the pendulum to reach very large angles! 

' For this to be possible, clearly the mass should be attached to a rigid rod (not a string)! 
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loops—to the initial angular displacement of 7r/5, 7 t/2, 27t/ 3, and (almost) tt. 
These loops represent oscillations only: the angular displacement is bounded 
between a minimum and a maximum value determined by 0(0). The closed 
loops are characterized by the fact that the angular speed vanishes at maxi¬ 
mum (or minimum) 0, allowing the pendulum to start moving in the opposite 
direction. 

The outermost curves result from 0(0) = — n, to(0) = 1 (the upper curve), 
and 0(0) = 7T, w(0) = —1 (the lower curve), and represent rotations. The an¬ 
gular displacement is unbounded: it keeps increasing for all times. Physically, 
this corresponds to forcing the pendulum to “go over the hill” at the top by 
providing it an initial angular velocity. If the pendulum is pushed over this 
hill once, it will continue doing it forever because there is no damping force. 
The rotations are characterized by a nonzero angular velocity at 0 = ±7r. This 
is clearly shown in Figure 31.8. 

What happens when the damping force it turned on? We expect the tra¬ 
jectories to spiral into the origin of the phase space as in the case of the linear 
(small-angle) pendulum. Figure 31.9 shows two such trajectories correspond¬ 
ing to an initial displacement of just below tt (on the right), and just above 
—tt (on the left). For both trajectories, the initial angular velocity is zero. It 
is intuitively obvious that regardless of the initial conditions, the pendulum 
will eventually come to a stop at 0 = 0 if there are no driving forces acting 
on it. So, Figure 31.9 is really representative of all dissipative motions of the 
pendulum. 

The origin of the phase space is a fixed point of the pendulum dynamics. 
But it is not the only one. In general, any point in the phase space for which 
the time derivative of all coordinates of the trajectory are zero is a fixed point 
(see Problem 31.6). If we set all the functions on the RHS of Equation (31.23) 
equal to zero, 8 we obtain 

to = 0 and sin 0 = 0 

corresponding to infinitely many fixed points at ( mr , 0) with n an integer. 
Points in the neighborhood of the origin, i.e., those lying in the basin of 



-3-2-10 1 2 3 


Figure 31.9: Phase space diagrams for a dissipative pendulum. Two trajectories 
starting at 9 « — tt, to = 0 and 9 « tt, to = 0 eventually end up at the origin. 


8 Still assuming no driving force. 
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limit cycle 


Hopf bifurcation 


attraction of the origin are attracted to the origin; the rest of the fixed points 
are repellors (or unstable) for such points. 

The interesting motion of a pendulum begins when we turn on a driving 
force regardless of whether the dissipative effect is present or not. Neverthe¬ 
less, let us place the pendulum in an environment in which 7 = 0.3. Now 
drive this pendulum with a (harmonic) force of amplitude <f>o — 0.5 and angu¬ 
lar frequency u>d = 1. A numerical solution of (31.23) will then give a result 
which has a transient motion lasting until t ss 32. From t = 32 onward, the 
system traverses a closed orbit in the phase diagram as shown in Figure 31.10. 
This orbit is an attractor in the same sense as a point is an attractor for the 
logistic map and a dissipative nondriven pendulum. An attractor such as the 
one exhibited in Figure 31.10 is called a limit cycle. 

31.2.3 Onset of Chaos 

As we increase the control parameter </o, the phase space trajectories go 
through a series of periodic limit cycles until they finally become completely 
aperiodic: chaos sets in. Figure 31.11 shows four trajectories whose common 
initial angular displacement 9q, initial angular velocity ojq, damping factor 7 , 
and drive frequency u>d are, respectively, 7 r, 0, 0.5, and 2/3. The only (con¬ 
trol) parameter that is changing is the amplitude of the driving force <f> o- This 
changes from 0.97 for the upper left to 1.2 for the lower right diagram. 

A closer scrutiny of Figure 31.11—which we shall forego—will reveal that 
the chaotic behavior of the diagram at the lower right takes place after the 
pendulum goes through a bifurcation process as in the case of the logistic 
map. However, unlike the logistic map whose bifurcation stages were divided 
by fixed “points,” the stages for the pendulum are characterized by limit cy¬ 
cles. In fact, the diagram at the upper left, corresponding to <j >0 = 0.97, 
consists of two (very closely spaced) limit cycles. Bifurcations involving limit 
cycles are called Hopf bifurcation after the mathematician E. Hopf who 
generalized the earlier results of Poincare on such bifurcations to higher di¬ 
mensions. The logistic map and the nonlinear pendulum have the following 
property in common: their “route to chaos” is via bifurcation. This is not 



4 5 6 7 


Figure 31.10: The moderately driven dissipative pendulum with 7 = 0.3 and <j> 0 = 0.5. 
After a transient motion, the pendulum settles down into a closed trajectory. 
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Figure 31.11: Four trajectories in the phase space of the damped driven pendulum. 
The only difference in the plots is the value of <j >o which is 0.97 for the upper left, 1.1 
for the upper right, 1.15 for the lower left, and 1.2 for the lower right diagrams. 


true for all chaotic systems; there are other “routes to chaos,” but we shall 
not investigate them here. 

The main characteristic of chaos is the exponential divergence of neigh¬ 
boring trajectories. We have seen this behavior for the logistic map. A very 
nice illustration of this phenomenon for the nonlinear pendulum is depicted 
in Figure 31.12 where two nearby trajectories in the neighborhood of point 
(—2, —2) are seen to diverge dramatically (in eight units of time). 

The divergence of trajectories and the ensuing chaos has been termed the 
butterfly effect by Lorenz who, in the title of one of his talks, asked the 
question: “Does the flap of a butterfly’s wings in Brazil set off a tornado 
in Texas?” The point Lorenz is making in this statement is that if the at¬ 
mosphere displays chaotic behavior (as a simple model proposed by Lorenz 
predicts), then a very small disturbance, such as the flapping of a butterfly’s 



Figure 31.12: The projection onto the 6uj -plane of two trajectories starting at approxi¬ 
mately the same point near (—2, —2) diverge considerably after eight units of time. The 
loop does not contradict the DE uniqueness theorem! 


butterfly effect 
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wings, would make it impossible to predict the long-term behavior of the 
weather. 

In general, for a dynamical system obeying an autonomous set of first- 
order DEs to be chaotic three requirements are to be met: 

1. The trajectories must not intersect. 

2. The trajectories must be bounded. 

3. Nearby trajectories ought to diverge exponentially. 

The first requirement is a direct consequence of the uniqueness theorem 9 for 
the solution of DEs: if two trajectories cross, the system will have a “choice” 
for its further development starting at the intersection point, and this is not 
allowed. The very notion of fixed point as well as the crisscrosses of Figures 
31.11 and 31.12 may appear to violate the first property above. However, 
we have to remind ourselves that fixed points are (asymptotically) achieved 
after an infinite amount of time. As for the two figures, the reader recalls 
that all plots in those figures are projections of the three-dimensional trajec¬ 
tories onto the yz-plane. The three-dimensional trajectories never cross (see 
Figure 31.13). 

The second requirement is important because unbounded regions of phase 
space correspond to infinities which are to be avoided. The third requirement 
is simply what defines chaos. It turns out that one- and two-dimensional 
phase spaces cannot accommodate all of these requirements. However, in 
three dimensions, one can “stretch” out the trajectories that want to loop in 
two dimensions as shown in Figure 31.13 where the loop of Figure 31.12 is 
seen to have been only a two-dimensional shadow! Thus, 



Figure 31.13: The two trajectories of Figure 31.12 shown in the full three-dimensional 
phase space. 

9 For our purposes this theorem states that if the dynamical variables and their first 
derivatives of a system are specified at some (initial) time, then the evolution of the system 
in time is uniquely determined. In the context of phase space this means that from any 
point in phase space only one trajectory can pass. 
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Box 31.2.1. A necessary condition for a system obeying autonomous DEs 
of first degree to be chaotic is to have a phase space that has at least three 
dimensions. 


The reader may wonder how a (one-dimensional) pendulum can satisfy 
the condition of Box 31.2.1. After all, the phase space of such a pendulum 
has only two dimensions. The answer lies in the fact that although a driven 
pendulum—with only 9 and u) = 9 regarded as independent variables—obeys 
DEs that are not autonomous, when time is turned into the third dimension 
of the phase space, a set of three autonomous DEs will result which allows 
chaotic behavior. This is in fact obvious from Equation (31.23) where a —the 
third dimension of the phase space—is seen to be essentially time in units of 
lvd- A pendulum that is not driven does not exhibit chaotic behavior. 


31.3 Universality of Chaos 

In the preceding sections, we examined two completely different systems dis¬ 
playing chaotic behavior. Although there are different “routes” to chaos, we 
shall concentrate only on the period-doubling route because it has been the¬ 
oretically developed further than the other routes, and because it displays a 
universal character common to all such chaotic systems as discovered by one 
of the founders of the theory of chaos, Mitchell Feigenbaum. 

31.3.1 Feigenbaum Numbers 

In our theoretical investigation of the logistic map, we introduced the control 
parameters a n at which the nth bifurcation takes place and for which there 
are a number of 2 n -cycle fixed points. It turns out that the ratio 

S n ee (31.24) 

C^n+1 

is almost the same for all large n, and that, in the limit as n —» oo, it ap¬ 
proaches a number 5*, now called the Feigenbaum delta: 

5* = lim S n = lim —-- ——- = 4.66920 .... (31.25) 

n—>oo n—>oo a n +i — Ct n 

Feigenbaum looked at the same ratio for the so-called iterated sine function 

x n+ i = /3sin(7ra; n ) 

and found that exactly the same number was obtained in the limit. Later, he 
showed that 5* is the same for all iterated map functions! 10 

10 This is not entirely true. The map functions should have a parabolic “shape” at their 
maximum. The logistic map and the sine function—as well as many other functions—have 
this property. 


Only a driven 
pendulum (of 
large-angle 
oscillation) 
exhibits chaotic 
behavior. 


Feigenbaum delta 
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We can use 8 * to calculate approximations to a n for large values of n, and, 
in particular, to find an approximate value for a^. First we note that, if we 
approximate S n with 8 *, then (31.24) yields 

&n— 1 &n— 1 

ot n +1 —-j-1- ot n « -—-b a n . 

o n 8* 


For example, 


«2 — oi\ , 

a 3 ~ --l~ a 2 , 


«3 — 012 

OLA ~ --b 03, 


or 


(a 2 — Oil)/ 5 * , , / 1 1 | 

04 « -^- b 03 - (02 - Oil) + ^2 ) + a 2- 

We can easily generalize this to 


/ , , 1 1 1 

OAT « (o 2 - Oi) (— + — + ••• + ^ (Ar _ 1} 


02- 


(31.26) 


In the limit that N 
to 1/(5* — 1). So, 


oo, the sum becomes a geometric series which adds up 


Ooo 


o 2 — Oi 
8 * - 1 


+ o 2 . 


(31.27) 


With oi = 3 and o 2 = 1 + \/6, we obtain 


O oo 


•v/6-2 

3.66920 


+ 1 + 76 = 3.572. 


The actual value—obtained by more elaborate calculations—is 3.5699 .... 

Another quantity that seems to be universal is the ratio of the consecutive 
“bifurcation sizes.” We mentioned earlier that there are several fixed points 
associated with the 2"-cycle parameter a n . At each stage of bifurcation, these 
fixed points come in pairs. For example, at a 2 = 1+76, Equation (31.16) gives 
the two fixed points at x = 0.849938 and x = 0.43996. We define the “size” 
di of the 4-cycle bifurcation as the (absolute value of the) difference between 
these a;-values. In general, we define d n , the size of the bifurcation pattern 
of period 2 n as the largest (in absolute value) of the differences between the 
two ai’s of each of the 2" pairs of fixed points. On a bifurcation diagram, one 
would measure the vertical distance between the points where each curve of the 
diagram starts to branch out. If there are several such distances, one chooses 
the largest one. The second Feigenbaum number, the so-called Feigenbaum 
Feigenbaum alpha alpha, is then defined as 


a* = lim = 2.5029 .... 

n=oo d n+ i 


(31.28) 


Feigenbaum found that this number is obtained for the bifurcation pattern of 
all chaotic systems which reach chaos via bifurcation. 
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Aside from its universality as applied to different chaotic systems, this 
number suggests a general “size” scaling within the bifurcation pattern of 
a single system: For large enough values of n, the ratio of the size of each 
bifurcation is the same as the previous one. If we “blow up” the small bifur¬ 
cations taking place for large values of n, they look almost identical to the 
ones occurring before them. This property is also called self-similarity. 


31.3.2 Fractal Dimension 

An elegant way of quantifying chaos is by examining the geometric properties 
of the trajectory of the chaotic system under study. Suppose we let the system 
run for a long time and suppose that it gravitates toward an attractor and 
remains there. 11 What is the “dimension” of the trajectory? The clarification 
of this question and the logic (as well as the application) of its answer is the 
subject of this subsection. 

Intuitively, one assigns the dimension of 0 to points, 1 to curves, 2 to 
surfaces, 3 to volumes, and n to “solid” objects residing in spaces requiring 
n coordinates to describe their points. How can we go beyond intuition? We 
use the so-called Hausdorff dimension, whose calculation goes as follows. 
Try to cover the geometrical object by appropriate “boxes” of side length r. 
Now count the number N(r) of boxes required to contain all points of the 
geometric object. The Hausdorff dimension D is defined by 

N(r) = lim (fcr~ D ) , (31.29) 


where k is an inessential proportionality constant which describes the shape of 
the “box.” For example, as a box, we could use a “sphere” of radius r. Then 
the “volume” would be 2 r for a line, 7rr 2 for a circle, and |7rr 3 for a sphere. 
Thus, k is 2, or n or |7r. If we choose “cubes,” k will always be 1. Furthermore, 
by changing the unit of length, one can change k. Fortunately, as we shall see 
shortly, k will not enter the final definition of Hausdorff dimension. 

Equation (31.29) can be solved for D , 


D = lim 

r —»0 


In N(r) 
lnr 


In k 
lnr 


Now, we can see why k is not essential: As r —> 0 the denominator of the 
second term grows beyond bound. So, 


D = — lim 

r —»0 


In N(r) 
lnr 


(31.30) 


Let us test (31.30) on some familiar geometric objects. If the object is a 
single point on a line, then only one “box” is needed to cover it regardless of 
the size of the box. So, N(r) = 1 for all r, and Equation (31.30) gives D = 0. 

11 By an attractor, we mean any geometrical object on which the trajectory hovers. It can 
be a fixed point, a limit cycle, or some multidimensional object in the phase (hyper-)space. 
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Cantor set 


fractal object or 
fractal 


In fact, the Hausdorff dimension of any finite number of points on a line is 
found to be zero. Similarly, the dimension of a finite number of points on a 
surface or in a volume is also zero. 

If the object is a surface of area A, then we require A/r 2 boxes (squares) 
to cover the entire area. Thus, 


D = — lim 

i —>0 


ln(A/r 2 ) 

lnr 


= — lim 

r —»0 


In A — 2 In r 
lnr 


= 2 . 


Similarly, the reader may check that the Hausdorff dimension for a curve is 
1, and for a volume it is 3. So, the formula seems to be working for familiar 
geometric objects. 

Example 31.3.1. A not-so-familiar geometric object is the Cantor set: Take the 
closed interval [0,1]; remove its middle third; do the same with the remaining two 
segments; continue the process ad infinitum (Figure 31.14). What is left of the line 
segment is the Cantor set, named after the German mathematician whose work on 
set theory, controversial at the time, laid the foundation of modern formal mathe¬ 
matics. Figure 31.14 should convince the reader that after n steps, 2™ segments are 
left and that the length of each segment is (l/3) n . Thus, denoting the size of the 
box after n steps by r n , we have 


Therefore, 


r„ = (l/3) n , N(r n ) = 2". 


lim lnN ^ = - lim lllAr ( r ”) = _ l im hl ^ | 
i lnr n —>oo In r n n^oo m[(l/3) n J 


= — lim 

71 —M30 


n In 2 
n ln(l/3) 


In 2 

RVS) 


In 2 
In 3 


= 0.6309 


(31.31) 


So, the Cantor set is more than just a set of points (dimension zero) and less than a 
line segment (dimension one). It is amusing to note—as the reader may verify—that 
the length of the Cantor set is zero! _ 


The Cantor set is only one example of geometrical objects whose dimen¬ 
sions are nonintegers: 



Figure 31.14: The Cantor set after one, two, three, and four “dissections." 
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Box 31.3.1. A geometrical object, whose Hausdorff dimension in not an 
integer, is called a fractal object or simply a fractal. 


Example 31.3.2. Another example of a fractal object is the so-called Koch 
snowflake. Start with an equilateral triangle of side L [Figure 31.15(a)]; remove 
the middle third of each side and replace it with two identical segments (a “wedge”) 
to form a star [Figure 31.15(b)]. Do the same to the small segments so obtained 
[Figure 31.15(c)], and continue ad infinitum. The result is the Koch snowflake. 

Let us find the Hausdorff dimension of the Koch snowflake. 12 It should be clear 
that the number of line segments on each side of the triangle at step n is 4 n so that 
N(r n ) = 3 x 4", and the length r n of each line segment is L/3". Therefore, the 
Hausdorff dimension of the Koch snowflake is 


lim hl A ^ r "' ) = - lim *' 4 ^ 

n —>oo In r n n —*00 ln(L/3 n ) 


= — lim 

n —>oo 


In 3 + n In 4 
In L — n In 3 


In 4 
In 3 


1.2618595.... 


(31.32) 


The length of the perimeter of the snowflake is 


lim N(r n )r n 

n —»oo 


lim (3 x 4") (L/3") = 3L lim (I)" 

n—>oo n—>oo ^ 


OO. 


It is interesting to note that the area enclosed by the Koch snowflake is finite while 
its perimeter is infinitel g 


The fractals discussed so far have the property which we called self-similarity. 
The present case is, however, a true (or regular) self-similarity because, as we 
scale the object, we obtain the exact replica of the original. In contrast, for 





Figure 31.15: (a) Begin with an equilateral triangle, (b) Remove the middle third of 
each side and replace it with a “wedge" to form a star, (c) Remove the middle third of 
each new segment and replace them with "wedges." Continue ad infinitum to obtain 
the Koch snowflake. 

12 This is the dimension of the perimeter , not the area of the snowflake. 


Koch snowflake 
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strange attractor 


the logistic map, we obtained bifurcations which contained different scaling 
ratios: The ratio was a* only for the largest bifurcation size at each stage. 
Comparison of the largest size with smaller sizes would not have yielded a*. 
These (irregular) self-similarities occur frequently in chaotic systems and the 
determination of their Hausdorff dimension can give information about the 
long-term behavior of the dynamics of the system. 

In the case of the logistic map, the Hausdorff dimension of the set of verti¬ 
cal (fixed) points on the bifurcation diagram—which is zero at all finite stages 
of bifurcation—will not be zero at a^. It has, in fact, been calculated to be 
0.5388 .... This is an example of attractors that have noninteger dimensions, 
i.e., they are neither points nor lines. If the attractor of a dissipative dy¬ 
namical system has a fractal dimension, then we say that the system has a 
strange attractor. Strange attractors play a fundamental role in the theory 
of chaos. 


31.4 Problems 


31 . 1 . Show that (31.1) leads to (31.2). 

31.2. For the logistic map, assume that 1 < a < 3. Show that if a;*, > 1 — 1/a, 
then Xk+i < Xk, and if Xk < 1 — 1/a, then Xk+i > Xk■ Therefore, conclude 
that x* = 1 — 1 /a is a stable fixed point. 

31.3. Write the cubic polynomial in Equation (31.10) as 


a 


3 



(; x 2 + ax + b) 


and determine a and b by expanding and comparing the result with (31.10). 
Now solve x 2 + ax + b = 0 to obtain 22 (a) and 23 (a) of (31.11). 

31.4. Derive Equation (31.21) from the equation that precedes it. 

31.5. Convince yourself that a system of N particles in space has a 6 N- 
dimensional phase space. 

31.6. Consider a set of autonomous first-order DEs. Suppose that a point P 
of the phase space is a root of all functions on the RHS. By expanding each 
coordinate of a trajectory in a Maclaurin series in t and keeping only the first 
two terms, show that the trajectory does not move away from P. So, fixed 
points are determined by setting all functions on the RHS of an autonomous 
system equal to zero and solving for the coordinates. 

31.7. Derive Equation (31.27) from (31.26). 

31.8. Show that the dimension of a finite number of points on a surface or 
in a volume is zero. 
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31.9. Show that the Hausdorff dimension of any finite number of points is 
zero, of a curve is 1, and of a volume is 3. 

31.10. Show that the Hausdorff dimension of the Cantor set is independent 
of the length of the original line segment. 

31.11. Verify that the length of the Cantor set is zero. 




Chapter 32 


Probability Theory 


Although probability theory did not flourish until after the Renaissance, and 
in particular in the 17th and 18th centuries, its roots go back to ancient 
history. Archaeological excavations reveal the presence of knuckle-bones (or 
astragali) in numbers far larger than any other kind of bones, indicating the 
possibility of the use of these bones in games. There is strong evidence that as¬ 
tragali were in use for board games at the time of The First Dynasty in Egypt 
(c.3500 B.C.). Other archaeological excavations, unearthing more recent pe¬ 
riods, e.g. 1300 B.C. in Turkey, also reveal a definite connection between 
astragalus and recreation. 

It seems that games of chance, such as the board game mentioned above, 
are, like counting, as old as civilization itself. Yet the science of counting, 
arithmetic, was already in an advanced stage of evolution when probability 
started to take root as a mathematical science in the 17th century. Why? 
Perhaps the reason is the crudeness with which “randomizers” such as dice— 
the artificial substitutes of astragali—were made for a long time. Abstraction 
requires perfection. Although the abstraction of counting from what was being 
counted took place naturally, the corresponding abstraction of randomness 
from what is random demanded an ideal device capable of producing random 
events, and a large number of experimental data for analysis, and this did not 
happen until well into the 17th century. 


32.1 Basic Concepts 

The reader no doubt has some familiarity with the notion of a random event. 
Any occurrence or experiment, whose outcome is uncertain is such an event. 
Flipping a coin, pulling a card out of a deck of cards, and throwing a die 
are all examples of experiments whose outcome are uncertain (if the coin, the 
deck of cards, and the die are all “unbiased”). The reader may also know 
intuitively that the chance of getting a head in the toss of a coin is 50% (or 
1 out of 2, or 0.5); that the chance of getting a 3 in the throw of a die is 1 


Random event: 
basis for 
probability 
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out of 6; and that the chance of getting a club in drawing a card is 1 out of 4. 
The aim of the theory of probability is to make precise these intuitive notions 
and to develop a mathematical procedure for answering questions related to 
random events. First we need to review some simple concepts from set theory. 

32.1.1 A Set Theory Primer 

The most fundamental entity in any branch of mathematics is a universal 
set. It is the collection of all objects under consideration. For example, the 
universal set of plane geometry is a flat surface, and of solid geometry is 
the three-dimensional space. The universal set of calculus is the set of real 
numbers (or the real line), and the complex plane is the universal set of 
complex analysis. The generic universal set is denoted by 8, but each specific 
universal set has its own symbol: R is the set of real numbers, C is the set of 
complex numbers, Z is the set of integers, and N is the set of natural numbers 
(nonnegative integers). 

The simplest relation in set theory is that of belonging. We write a £ 8 
(and say “a belongs to S” or “a is in 8” to express the fact that a is one of 
the objects in 8. An object in 8 is called an element of 8. A collection A 
of elements of 8 is called a subset of 8 , and we write A C 8 . In particular, 
8 C 8 . Any subset can be considered as a set with its elements and subsets. 
Thus, a £ A means that a is one of the elements of the subset A, a qL A 
means that a is not one of the elements A, and B C A means that B consists 
of elements, all of which belong to A. Subsets are often specified either by 
enumeration or by some statement enclosed between a pair of curly brackets. 
For example, 


{ 0 , 1 , 2 , 3 ,...}, 

{(x, x)\x e R}, 


{2,4,6,...}, {2n+l|neN}, 

13.61 


neN,n/0>. 


The first describes N; the second, the set of even numbers; the third, the set of 
odd numbers; the fourth, the line y = x; and the fifth, the energy levels of the 
hydrogen atom in electron volt. Two subsets are equal if each is a subset of 
the other. In other words, if A C B and B C A, then A = B. It is convenient 
to introduce the empty set, a subset 0 of 8, which has no element. 

The subsets of a universal set have a rich mathematics which we can only 
briefly outline here. Given two sets 1 A and B , we can form another set, called 
the union of A and B and denoted by A U B, which consists of all elements 
belonging to either A or B or both. Thus, 


AU B = {x £ S|x £ A or x £ B}. 


x It is very common to delete the prefix ‘sub’ and refer to subsets of a universal set as 
simply sets. 
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The intersection of A and B , denoted by AnB, consists of all elements that 
belong to both A and B: 

A n B = {x € S|x € A and x £ B}. 

The complement of a set A is the subset of § which contains all the elements 
of § which are not in A. Denoting this set by A c , we have 

A c = {x £ §|x ^ A}. 

The reader may easily verify that § = AuA c and 0 = A <1 A c . When AnB = 0, 
we say that A and B are disjoint. 

The operations of union and intersection are commutative and associative: 

AU B = B U A, AnB = Bn A, 

(A U B) U C = A U (B U C), (A n B) n C = A n (B n C). 

Thus one can take the union and intersection of a number of sets without wor¬ 
rying about the order of the sets or where to put the parentheses. This makes 
it possible to introduce the following notations for the union and intersection 
of a family of sets: 

n 

U Ai = Ai U A 2 U • • • U A n , 

1=1 

n 

P| Ai = Ai n a 2 n •• • n A n . (32.1) 

i=l 

We define the difference between two sets A — B = AnB c as the collection 
of elements in A that are not in B. It is not hard to show that A — B, B — A, 
and AnB are mutually disjoint. Furthermore, 

A= (A- B)U(AnB), 

B = (B — A) U (An B), (32.2) 

A U B = (A - B) U (A n B) U (B - A). 

Note that all sets on the right-hand side of each equation are mutually disjoint. 

A useful way of picturing sets and operations on them is a Venn diagram. 
The universal set is depicted as a rectangle, and its subsets as circles in the 
rectangle. Figure 32.1 shows some examples of the use of Venn diagrams. 
Venn diagrams are intuitive representations of relations among sets. For ex¬ 
ample, the diagram on the right of Figure 32.1 shows clearly the equalities of 
Equation (32.2). 

Using Venn diagrams, one can show that the operation of union distributes 
over intersection and vice versa: 
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A n (B u C) = (A n B) u (A n c), 
A U (B n C) = (A U B) n (A U C), 


(32.3) 
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Figure 32.1: Venn diagrams of some sets, 
at the bottom. 




The grey region represents the set labeled 


and more generally, 


An 

AU 



i =1 
n 

i=1 


(32.4) 


32.1.2 Sample Space and Probability 

The underlying concept in probability theory is the sample space, which is 
the same as the universal set of the set theory and is also denoted by S. It is 
the collection of all possible outcomes of an experiment. For example, for the 
toss of a coin, § = {H, T}; for the toss of two coins, § = {HH, HT,TH , TT}; 
and for a die, § = {1,2,3,4,5, 6}. An event E is simply a subset of the 
sample space. Thus, the event {HT,TH} is described as the outcome in 
the toss of two coins, in which one of the coins is head; and {2,4,6} is the 
event that the roll of a die produces an even number. An event, therefore, 
can be elementary or compound, with the latter being a collection of the 
former. 

We are now ready to define probability. Since the sample space—which 
is now also called the probability space —includes all possible events, its 
probability should be one, corresponding to absolute certainty. The probabil¬ 
ity of any event (any subset of the probability space) has to be a nonnegative 
number less than one. We may be tempted to say that the probability of 
the union of two events is the sum of their probabilities, but that would be 
wrong. For example, let S = {1, 2,3,4,5, 6} be the universal set of a die, 
and consider E\ = {1,3,5}, the odd outcomes, and E 2 = {4,5,6}, all out¬ 
comes greater than 3. Intuitively, we know that the probability for each of 
these two events is }. But fillip = {1,3,4,5,6}, and if we were to add 
probabilities for the union, we would get that the probability of {1, 3,4,5, 6} 
is one, which is clearly wrong. The reason for this is that we have actu¬ 
ally double-counted {5}, the intersection of the two sets. Only if the two 
sets are disjoint, can we add the probabilities for the union. Now we define 
probability: 







32.1 Basic Concepts 


785 


Box 32.1.1. § is called a probability space if for each event E C § 
there is a number P(E) satisfying the following conditions. 

1. 0 < P(E). 

2. P {§) = 1. 

3. If Ei and Ei are disjoint events, then P{E\ VJE 2 ) = P{E{) + P{E 2 ). 

Example 32.1.1. In this example we derive some relations involving probabilities. 

(1) If Ei C E 2 , then P(Ei) < P(E 2 ). To show this, use the first equation in (32.2) 
and the fact that Ei n E 2 = E 1 to write 

E 2 = (E 2 — Ei) U Ei and P{E 2 ) = P(E 2 — Ei) + P(Pi). 

Since P{E 2 — Pi) is nonnegative, we get P(E 2 ) > P(Pi). 

(2) P(E) < 1 for every event E. This is a consequence of (1), because E C S and 

P( §) = 1- 

(3) For any two events Ei and E 2 , 

P(Pi U P 2 ) = P(Ei) + P(P 2 ) - P(Ei n E 2 ). (32.5) 

Use Equation (32.2) to write 

P(Ei) = P{Ei - E 2 ) + P(Ei n E 2 ), 

P{E 2 ) = P{E2-Ei) + P(EinE 2 ), (32.6) 

P{Ei U E 2 ) = P{Ei — E 2 ) + P{Ei n E 2 ) + P{E 2 — Ei). 

Substituting P(Ei — E 2 ) and P(E 2 — Ei) of the first two equations in the last 
equation, we obtain the desired result. 

Using E as Ei and E c as E 2 , and noting that E — E c = E and E fi E c = 0, the 
first (or second) equation in (32.6) gives P(E) = P(E) + P(0), implying that 

(4) P(0) = 0. 

Using E as Pi and E c as E 2 again, and noting that E — E c = E and PuP c = S, 
the third equation in (32.6) and (4) give P(S) = P(E) + P(E C ), implying that 

(5) P(P C ) = 1 - P(P). ' ■ 

Condition 3 of Box 32.1.1 can be generalized to the case of a collection of 
mutually disjoint sets E±, E 2 , ■ ■ ■, E m : 

m 

P(E 1 Ufi 2 U ... U E m ) = P(Ei) + P{E 2 ) + • • • + P(E m ) = ^ P{E Z ). (32.7) 

i -1 

A collection of mutually disjoint sets E\, E 2 ,..., E m with § = Ei UE 2 U...UE m 

is called a partition of §. Such a collection has the property that partition of 

universal set 

m 

Y J P(E i ) = l. 

i =1 


(32.8) 
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Up to now, we have not assigned any value to P(E ) for a given set E, and 
it cannot be done without some further assumptions concerning the physical 
properties of the probability space and the events that make it up. In fact, 
if E\, E 2 , ■■■, E m partition §, any set of nonnegative numbers pi, P 2 , ..., p m 
adding up to 1 with P(Ei) = pi will satisfy the conditions of Box 32.1.1 and 
will turn 8 into a probability space. Physically, however, certain choices will 
not make sense. For instance, for 8 = {H,T}, the sample space of a single 
coin, one can set P(H ) = 0.75 and P(T ) = 0.25. However, this assignment 
is not very useful for ordinary coins, and in practice gives false results. For a 
probability space composed of elementary events, it is often natural to assign 
equal probability values to the elementary events. Thus if the Ei of Equation 
(32.8) are all elementary, then the natural assignment would be P(Ei ) = 1/to 
for i = 1, 2,..., to. For a coin, to = 2 and P(H) = P(T ) = 0.5 is a natural 
choice, while for a die P(i ) = 1/6, and for a deck of cards, P{Ei) = 1/52. 

32.1.3 Conditional and Marginal Probabilities 

In many situations, the sample space is partitioned in two different ways. 
For example, a deck of cards can be partitioned either by 4 suits or by 13 
values; the employees of a company can be partitioned by gender or by de¬ 
partments in which they work. Suppose E\, E2, ■ ■ ., E m and F\, F2 ,..., F n 
are two collections of events that partition 8 . It should be clear that Ei fl Fj, 
i = 1 , 2 ,..., to; j = 1,2,... ,n is also a partition of 8 , and that 

n m 

|J (Ei n Fj) = Ei and (J (E i nF j ) = F j . (32.9) 

3= 1 i=l 

Since Ei, Fj, and Ei fl Fj are all partitions of 8 , we can define the proba¬ 
bilities P(Ei), P(Fj), and P(Ei fl Fj). Then, Equation (32.9) implies that 

n m 

p^) = £ P(Ei n Fj) and P(Fj) = ^ P{Ei n Fj). (32.10) 

3= 1 *=1 

Marginal and P(Ei) and P(Fj) are called marginal probabilities. 

conditional Associated with the marginal probability is the conditional probability. 

probabilities Suppose we know that Ei has occurred. What is the probability of Fj ? For 

example, we draw a card from a deck of cards and somebody tells us that it is 
a heart. What is the probability that it is a jack? This conditional probability 
is denoted by P(Fj\Ei) and is the probability of Fj given that Ei has occurred. 

Example 32.1.2. The best way to understand marginal and conditional proba¬ 
bilities is to look at an example. Suppose that in a container, we have 100 marbles 
coming in three different sizes: small, medium, and large; and five different colors: 
white, black, red, green, and blue. Table 32.1 shows the distribution of the marbles 
according to color and size. 

First note that from the very definition of probability, the chance of getting a 
medium red marble in a random drawing is 0 . 07 , that of a large green marble is 0 . 03 , 
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White 

Black 

Red 

Green 

Blue 

Total 

Small 

5 

7 

6 

8 

4 


Medium 

8 

10 

7 

12 

8 


Large 

9 

5 

4 

3 

4 


Total 

22 

22 

17 

23 

16 

100 


Table 32.1: The distribution of marbles according to size and color. 


and there is a likelihood of 0.05 for drawing a small white marble. Similarly the 
probability that on a random drawing from the container, the ball is black is 0.22, 
and for the ball to be medium it is 0.45. This suggests the construction of another 
table, Table 32.2, which shows the distribution of the probabilities according color 
and size. 

Each entry of the last row and last column of Table 32.2 is what we have called a 
marginal probability. The conditional probability that the marble is small given that 
its color is white is 5/22. This is because, by restricting the color to white, we limit 
the number of marbles to 22 rather than 100. Similarly, the probability that the 
marble is green given that its size is medium is 12/45; this also is a conditional prob¬ 
ability. Conditional probabilities can be rewritten as ratios of probabilities. Thus, 
the probability that the marble is small given that its color is white is 0.05/0.22, 
and the second probability is 0.12/0.45. ^ 

The results of the foregoing example can be easily generalized. Let Pij = 
P(Ei n Fj), construct a table with m rows and n columns, and fill the cells 
with the numbers p,j . Add one more row for the totals with entries P(Fi), 
P{F 2 ), all the way to P(F n ). Add one more column for the totals with entries 
P(Ei), P(E 2 ), all the way to P(E m ). It should now be clear that P(Fj\Ei), 
the probability of Fj given that E t has occurred, is 


P{Fj\Ei) 


P{EiFFj) 

P{Ei) 


(32.11) 


Since any event and its complement partition the universal set, we can let 
F\ = A and F 2 = A c (only two F’s), and write the equation above as 


P(A\Ei) 


P{Ej n A) 
P{Ei) 


(32.12) 



White 

Black 

Red 

Green 

Blue 

Total 

Small 

0.05 

0.07 

0.06 

0.08 

0.04 

0.3 

Medium 

0.08 

0.1 

0.07 

0.12 

0.08 

0.45 

Large 

0.09 

0.05 

0.04 

0.03 

0.04 

0.25 

Total 

0.22 

0.22 

0.17 

0.23 

0.16 

1 


Table 32.2: The distribution of probabilities according to size and color. 
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or if we have two sets A and B and their complements as two partitions of §, 
then 


P{A\B) 


p(b n A) 
P(B) 


(32.13) 


and this is true for any two sets. 

If the probability P(A\B) does not depend on the event B in any way, i.e., 
if P(A\B) = P(A), then we say that the two events A and B are statistically 
independent. Equation (32.13) now yields 


p (A) - P{ p^ ] or P(AnB) = P(A)P(B), (32.14) 

and the second equation becomes the definition for two events to be statisti¬ 
cally independent. 

It is important to differentiate between statistical independence and mu¬ 
tual exclusion. If two events are mutually exclusive then they have to be 
statistically dependent since the occurrence of one precludes the occurrence 
of the other. Similarly, Equation (32.14) shows that if P(A) > 0, P(B) > 0, 
and A and B are statistically independent, then P(AtlB) yf 0, implying that 
A D B yf 0 and, therefore, that A and B cannot be mutually exclusive. 
Equation (32.12) could be rewritten as 

P(AnE i )=P(E i )P(A\E i ), 

and since A n E, t are mutually exclusive and their union is A, we have [see the 
second equation in (32.10) with A = Fj] 


P(A) = Y, P(Ei)P(A\Ei). (32.15) 

i =1 


This is called Bayes’ theorem. 

Example 32.1.3. A selective four-year college admits mostly students whose ACT 
scores are 32 and higher, with a small number of admitted students whose scores are 
below 32. The college has a graduation rate of 97%. Of those who graduate, 98% 
have an ACT score of 32 and higher. Of those who drop out, 85% have an ACT 
score below 32. We want to calculate the probability of graduation for a student 
who has an ACT score below 32. 

Let E i and E 2 denote the events corresponding, respectively to graduating and 
dropping out. Let Fi and F 2 denote the events corresponding to an ACT score of 
32 or higher and lower than 32, respectively. We are after P(E\\F 2 ). 

Consider the following table, in which the most obvious probabilities are entered: 



Ei 

f 2 

Total 

E 1 



0.97 

e 2 



0.03 

Total 



1 
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Since we are given that P(E\) = 0.97 and P(Pi|Pi) = 0.98, we can use Equation 
(32.11) to find the entry, pn, in the first row and first column: 


pn = P(Pi n Pi) = P(Pi n Pi) = P(Pi|Pi)P(Pi) = 0.98 x 0.97 = 0.9506. 

The entry, pi 2 , in the first row and second column can now be calculated because 
the total is given as 0.97: 

P 12 = P(P 2 n Pi) = 0.97 - 0.9506 = 0.0194. 

The table now looks like 



Pi 

P 2 

Total 

Pi 

0.9506 

0.0194 

0.97 

e 2 



0.03 

Total 



1 


We are also given that P(p 2 |P 2 ) = 0.85. So, using Equation (32.11) again, we 
can find P 22 : 

P 22 = P(P 2 n P 2 ) = P(P 2 |P 2 )P(P 2 ) = 0.85 x 0.03 = 0.0255. 

The remaining entries are now trivial to calculate: 

P 21 = P(Pi n P 2 ) = 0.03 - 0.0255 = 0.0045, 

P(Pi) = 0.9506 + 0.0045 = 0.9551, 

P(P 2 ) = 0.0194 + 0.0255 = 0.0449, 

and the complete table becomes 



Pi 

P 2 

Total 

Pi 

0.9506 

0.0194 

0.97 

e 2 

0.0045 

0.0255 

0.03 

Total 

0.9551 

0.0449 

1 


The desired probability is therefore, 


P(Pi|P 2 ) = 


P(Pi n P 2 ) _ 0.0194 


= 0.432. 


P(P 2 ) 0.0449 

So, there is almost a 43% chance for the graduation of a student whose ACT score 
is below 32. ■ 


32.1.4 Average and Standard Deviation 

When we are given a set of values—say the scores of students in a class—and 
asked to find the average, we add the values and divide by the total number 
of values. If {xi}^ is the set of values, then the average x is given by 



N 



x = 
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This equation tacitly assumes that the probability is the same for all values 
and equal to 1/N. If the probability depends on i, the definition of the average 
has to take this into account. Let pt denote the probability for the occurrence 
of xi, and change the notation for the average to the more common notation 
whereby capital letters are used inside angle brackets. Then the average or 
mean or expectation value of {xi}^L 1 is defined as 

N 

(X) = Y J XiPi- (32.16) 

i=l 


Another quantity of interest is the standard deviation, which is a mea¬ 
sure of how the values are spread from the mean. It is the average “distance” 
between x and x t . The obvious choice Xi — x will have a zero average, because 
it is both positive and negative and the definition of x makes the positive and 
negative values cancel. To avoid this cancellation, one takes the square of 
these differences and then averages them. The variance a 2 is defined by 

2_ EtrQci -^) 2 

N 

and the standard deviation by 


a = 



(32.17) 


When probability varies with athe definition of the variance changes to 


N 

V 2 = - <- X )?Pi■ (32.18) 

i -1 


In many situations one may be interested in the average of a quantity that 
depends on the random variable Xi . Thus, if g(xi) is such a quantity, one 
writes 

N 

(ffPO) =^2g(xi)Pi- (32.19) 

i=l 

In terms of such averages, one can show that 


CT 2 = (X 2 ) - (X) 2 . 


(32.20) 


Related to averages is the moment generating function defined by 


N 


tX' 


= J2 etXi p*- 


The name comes from the fact that 

d k 


tx \ 


dt k 


= (x k ). 


i=0 


(32.21) 


(32.22) 
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32.1.5 Counting: Permutations and Combinations 

The probability space of many situations is discrete. In fact, one can say 
that all probability spaces are discrete, and only in the limit of large samples 
(atoms and molecules in thermodynamics, for example) can one approximate 
the random variable as a continuous variable. Therefore, it is important to 
find formulas that give the number of particular events of a universal set. 

Suppose you have TV distinguishable particles and you want to place 
them in M bins. There are two cases that are used in practice: each bin 
can hold as many particles as you place in it; or each bin can hold only one 
particle. For each case, we are interested in finding the number of distinct 
arrangements, or the number of configurations. Let this number be denoted 
by fi (TV, M). 

If there is no restriction on the occupancy number, then you have M 
choices for the first particle, M choices for the second particle, etc. Therefore, 

f Imb(N,M) = M n . (32.23) 

In statistical mechanics, this is called the Maxwell-Boltzmann statistics. 

If the occupancy number is one, then you have M choices for the first 
particle, M — 1 choices for the second particle, etc. Therefore, 

M! 

P P (N, M) = M(M - 1)(M - 2) • • • (M - TV + 1) = M > TV. 

(32.24) 

This is called the permutation of M objects taken TV at a time. If M = TV, 
then fi(TV, TV) = TV! is simply called the permutation of TV objects. 

The elementary constituents of nature are indistinguishable or iden¬ 
tical. How does this affect the formulas above? Let’s consider the single¬ 
occupancy case first because it is easier. Equation (32.24) is overcounting the 
arrangement by TV! because a permutation of the particles does not give any 
new arrangement. Therefore, 

n ‘ (Ar - "> = NW- iV)! S (") ' M > N ■ < 32 25 > 

In statistical mechanics, this is called the Fermi-Dirac statistics. It is also 
called the combination of M objects taken TV at a time. 

The multiple-occupancy case for indistinguishable particles is harder, but 
there is a trick that can make it easier to derive the formula. Figure 32.2(a) 
shows some bins with particles inside them. We can represent the arrangement 
by placing the particles outside and to the right of the bins and represent the 
bins by vertical lines as in Figure 32.2(b). Each vertical line has some particles 
to its right and left except the bin on the extreme left. , which has particles only 
to its right. Since there is no limitation on the number of occupancy, the 
number of arrangements can be calculated by permuting both the lines and 
dots except the line on the extreme left. Since the dots are identical (as 
are the lines), the problem reduces to finding the number of permutations of 


Maxwell- 

Boltzmann 

statistics 


Permutation 


Fermi-Dirac 

statistics 
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(a) (b) 


Figure 32.2: (a) The bins with particles inside them, (b) Bins are represented by 
vertical lines with the occupying particles to their right. 


Bose-Einstein 

statistics 


Universality of 

binomial 

distribution 


N + M — 1 objects N of which are identical and M — 1 of which are also 
identical, but different from the other N. Therefore, 


^be(N, M) 


(N + M- 1)! 
N\(M — 1)! 


N + M-1 
N 


(32.26) 


In statistical mechanics, this is called the Bose-Einstein statistics. 


32.2 Binomial Probability Distribution 

The Fermi-Dirac statistics is closely related to the so-called binomial distri¬ 
bution. Each of the M bins has two states: either it is occupied or empty. 
There are many situations where the binomial distribution applies. For exam¬ 
ple, in tossing n coins, each coin can be a head or a tail; a quantum mechanical 
spin-half particle can have its spin “up” or “down;” in a binary alloy system 
each site of the alloy can be occupied by an atom A oi B. 

In fact, the binomial distribution is more general than this. In any trial, 
one can talk about success and failure, where success refers to one particular 
outcome (out of the many possible outcomes), and failure to the rest of the 
possible outcomes. Thus, if we are after a 6 in a toss of a die, then getting a 
6 is a success, and getting 1, 2, 3, 4, or 5 is a failure. 

Let p be the success probability, then the failure probability is q = 1 — p. 
What is the probability P(m, n) that in n trials we have m successes? Be¬ 
cause the events are statistically independent (what happens in each trial is 
independent of what has happened and what will happen), by (32.14), the 
probabilities multiply. Thus the probability of m successes and n — m failures 

is p m q n - m • and since there are ways that this can happen in n trials, 

the probability P(m, n) of m successes in n trials is 

P(m, n ) = p m q n ~ m = 71 ' p m <r~ m . (32.27) 

\rrij m\{n — my. 

Using the Stirling approximation a;! ss \/2tt e x x x+i /2 of Equation (11.6), the 
reader can show that 

P(m, n) w 2 n J—e- < ~ n - 2m)2/2n p m q n - rn , (32.28) 

V nn 

assuming that both m and n — m. are large, which is true in almost all cases 
of large systems. 
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The special case of p = q = ^ is of importance: 


P(m, n) 


n\ 

m\(n — m)\ 2 " 


1 -(n-2m) 2 /2n 


(32.29) 


Sometimes (32.28) is written in terms of the difference s between the number 
of successes and failures. This is conveniently equal to 2m — n which is the 
exponent of the exponential. We call s the success excess. Thus, (32.28) 
becomes 

P(s, n) » f \{s, n) = 2 n P(s, n ) = 2 " A /A e -* 2 / 2 ™ j 

V rm V U 7 T 

(32.30) 

where flf, is the number of configurations now written in terms of s. 

For the binomial distribution we can easily find the moment generating 
function. From its definition, we have 


( e tx ) = Y,e tx P(x,n) = Y,e tx 

x=0 x=0 

ipe t ) x q n - x = (pe t + q) n , (32.31) 

tc=0 v 7 

the last equality following from the binomial theorem. Equation (32.31) allows 
us to easily calculate the average and variance for the binomial distribution. 
First note that 

^(e tx ) = ?rpe*(pe* + g) ra_1 , ^2 ( e *' Y ) = npe t (q+npe^ipe* + q) n ~ 2 . 

Now evaluate these at t, = 0 -and note that p + q = 1—to obtain 

{X) = np , {X 2 ) = n 2 p 2 + npq , cr 2 = (X 2 ) - {X) 2 = npq. (32.32) 




Example 32.2.1. Assume that the probability at birth that the newborn is male 
(or female) is What is the probability that in a household of six, three are male? 
Blind intuition tells us that the probability is 1; but that is wrong! Rephrasing the 
question to “What is the probability that in six trials we get three successes?” leads 
us to the binomial distribution and the following answer: 


P( 3,6) 


6 ! / 1\ 3 / 1\ 6 “ 3 
3!3! ^2 ) 


6 ! 

3!3! 


0.3125. 


This result may be surprising, but even more surprising is the result we obtain 
if we ask the same question about a (small) school: “What is the probability that 
in a school with 200 pupils, 100 are male?” 


P(100, 200) 


200 ! / 1\ 100 / 1\ 200 - 100 
100! 100! \2y ^27 


200 ! / 1 \ 200 
100 ! 100 ! \ 2 ) 


0.056. 
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The surprise encountered in the preceding example is due to the confusion 
caused by mixing the expected value with its probability. In a binomial distri¬ 
bution, the expected value (or the mean or average) (X ) = np, or ( X) = n/2 
when p = 0.5, which is the answer we intuitively gave to the two questions in 
the example above. Since our surprise increases with n , let us investigate the 
behavior of the binomial distribution for values of m close to the mean for 
very large n. 

For large n and m, we can use (32.29), from which we obtain 

P(n/2,n) « \ —■ 

V U.7T 

This shows that P(n/2,n) —> 0 as n —> oo. Thus the probability of having 
n/2 successes in n trials becomes negligible as the number of trials increases. 
But this is the maximum probability! Therefore, any other probability goes 
to zero even faster. Where have all the probabilities gone? 

Consider the graph of (32.29) for large n and plot it to a scale such that 
the peak of the maximum, although small, is conspicuous. Figure 32.3 shows 
such a graph for n = 1000. Note that the maximum probability has a value of 
only 0.025, and that the graph drops to a value that is indistinguishable from 
zero at about m = 560 on the right and m = 440 on the left. We can actually 
calculate the ratio r of the small probability at m = 560 to the maximum at 
to = 500 using (32.29): 


_ P(560,1000) 
7 ' = P{ 500,1000) 


10007T 


o -(- 120) 2 /2000 


IOOOtt 


_ ^ — 120) 2 /2000 


0.00075. 



Figure 32.3: The plot of the binomial probability distribution for n = 1000. 
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This same ratio is obtained if we use m = 440, and we can therefore conclude 
that the nonzero probabilities are essentially concentrated between m_ = 440 
and m + = 560 for n = 1000. 

Now let’s turn to a general n and find the corresponding values of m_ and 
m+. These are the values of m at which the probability drops to 0.00075 of 
its maximum value. To find m±, we have to solve the equation 


P(m, n ) 
P{n/ 2, n) 


2 r —(n—2m) 2 /2n 

717T 


_ g-(n-2m) /2 n 


0.00075. 


The answer is 

m± = 7j(n ± 3.8^/n), (32.33) 

as the reader may verify. So for a general n, a large fraction of the total 
probability is concentrated between i(?r — 3.8^/n) and ^(n + 3.8 y/n). But 
how much? What is the probability that the number of successes lies between 
m_ and m+? 

To answer this question, we have to add all probabilities between m_ and 
m_|_. For large numbers, one can replace the sum with an integral: 


P(iti- < m < m +, n ) 




(n-2m) 2 /2n dm _ 


Define a new variable of integration x so that the exponent of the integrand 
becomes — x 2 . This means 


n — 2m 


\/2n ’ " - 2 

and the integral in terms of x becomes 


n — \/2nx 

or m — ---, dm = — 


1 


3 . 8 /V 2 


dx = 0.999855, 



v 7r J-3.8/V2 

with the last result obtained by numerical integration. Therefore, 


P(m _ <m < m+,n) = 0.999855, (32.34) 


which is interestingly independent of n. 

Let us investigate the meaning of this result. It says that for very large n, 
99.99% of the time the successes lie between m_ and m+, and the probability 
of not getting a success between m_ and m + is only 0.0145%. Note also 
that when n gets large, m_ and m+ become very nearly equal to n/2. For 
example, if n = 10 9 , then 

^ = 5x 10 8 and m+= i(10 9 + S.Sx/lO 9 ) = 5.001 x 10 8 . 
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So the probabilities are concentrated in a very narrow interval; i.e., the prob¬ 
ability curve is extremely sharp. 

Going back to the probability of the male gender, we note that in a very 
populous country such as China or India with approximately 10 9 inhabitants, 
although the probability that exactly half the population is male is extremely 
small, and of the order of 




0.000025, 


the probability of the male population deviating too far from half is also 
small. So although the exact success (half male) is highly unlikely, a number 
of successes very close to exact is almost certain. 

Example 32.2.2. An isolated spin-| particle has equal probability of being in 
either spin-up or spin-down states. If there are n such particles, then the probability 
of m of them being in the up state is given by (32.29) or (32.30). 

When a spin-L particle with magnetic moment p is placed in a magnetic field 
B, it has two possible states: in the direction of the field (called up) and opposite 
to it (called down). In the first case the energy of the particle is — gB and in the 
second case +pB. The energy of the system is therefore determined by the success 
excess s, which in the present context is called the spin excess. 

Now suppose that you have two systems that can exchange energy between 
themselves with the combined system isolated. This means that the total energy 
of the system is conserved. This energy is determined by the total spin excess s. 
Let flii,(si, ni) be the configuration number of the first system and n 2 i,(s 2 ,n 2 ) for 
the second. Let H; ,(s,n) be the number of configurations for the combined system, 
where n = ni + n 2 and s = si + s 2 is a constant. Since the total configuration 
number is the product of the configuration number of the components, we have 

Qb(s,n) = Oi 6 (si,m)fl 2 b(s - si,n 2 ) = Cexp ^ ) ’ (32.35) 


where C is independent of s \. 

What is the equilibrium state of the system? This corresponds to the most prob¬ 
able state of the combined system, i.e., the state that maximizes flb(s,n). Instead 
of maximizing Q/,, let’s maximize its logarithm, which is 


In 11;, = In C — 


jl 

2m 


(*-si ) 2 

2n 2 


Differentiating with respect to si, we get 


d In H(, _ si s — si 

<9si m n 2 


(32.36) 


Note that the second derivative is 


1 1 

m n 2 
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which is negative, so the extremum is a maximum. Setting (32.36) equal to zero 
yields the most probable configuration: 

Sl S — Sl S 2 S 

rii n 2 ri 2 n ’ 


Condition for 

thermal 

equilibrium 


where the last equality follows from the previous ones (see Problem 32.13) and a 
caret on a quantity indicates its value at maximum. If we substitute these in (32.35), 
we get the maximum number of configurations: 


OS 


i {s,n) = C exp ( — 


Sl 

_ A 

2m 

2n 2 


= Ce 


—s 2 /2n 


(32.37) 


The verification of the last equality is the subject of Problem 32.14. 

Once in equilibrium, how likely is it for the system to move away from it? To 
investigate this, let si and S 2 be slightly different from their equilibrium values 


Sl = Sl + 8, S2 — s — Sl = s — Sl — 8 ~ s 2 — 8. 


Substituting these in (32.35) yields 

s? + 2si<5 + <5 2 sl-2s 2 8 + 8 2 


C exp — 


2m 


2m 


= Sir(s, n) exp ( - 2Jl f + <52 + 2 ' 2<5 " ^ 


2m 


2m 


But si/m = s 2 /n 2 . Therefore, 


»ift(si + 8 , m)n2t>(s2 — 8 , m) _ cx ( 8 2 \ 

llpfs,)!) eX ^ \ 2m 2m ) 


(32.38) 


As a realistic numerical example, let m = m = 10 23 and 8 = 10 13 so that the 
fractional deviation 8/n\ = 1CP 10 , a very small number. Then, the ratio in (32.38) 
is e -1000 = 5 x 1CT 435 . The probability for fractional deviations larger that 10 _1 ° 
is smaller than e -1000 . Assuming equal probability and adding the terms (about m 
of them), the upper bound for the total probability becomes me~ 1000 or 5 x 10 -412 
times the probability of finding the system in equilibrium. 

What is the meaning behind the statement “the probability to find the sys¬ 
tem with a fractional deviation larger than 1CP 10 is 5 x 1CP 412 of the probabil¬ 
ity of finding the system in equilibrium?” To have a reasonable chance of finding 
the system in such a deviated state, we have to sample 5 x 10 412 similar systems. 
Even if we could sample at the rate of 10 12 systems per second, we would have 
to sample for 5 x 10 4 °° seconds, or over 10 393 years, or 10 383 times the age of the 
universe! Therefore, it is safe to say that deviations described above will never be 
observed. ■ 


How likely is it for 
the system to 
abandon its 
equilibrium state? 


32.3 Poisson Distribution 

Poisson processes are famous results in the probability theory. A Poisson 
process occurs in circumstances under which an event is repeated at a constant 
rate of probability. Suppose that dt. is so small that the probability of the 
occurrence of two or more successes is negligible. Then the probability Pi(dt) 
of one success in dt is vdt, where v is a constant. 
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We are interested in P„(f), the probability of n successes in a time interval 
t. We can obtain a recursive differential equation involving P n (t) and P„_i(f), 
which we hope to solve to get P n (t). Consider P n (t + dt), the probability that 
n successes occur in time t+dt. This can be written as the sum of two disjoint 
probabilities, each consisting of the product of two independent probabilities: 
(a) n successes occur in time t and none in time dt, (b) n — 1 successes occur 
in time t and one in time dt. In symbols, 


P n (t + dt) = P n (t)Po(dt) + P n -\(t)P\{dt). 

But Pi(dt) = vdt and Pp(dt) = 1 — P\{dt) = 1 — vdt. Therefore, 


P n (t + dt) = P„(f)(l — vdt) + P n -i(t)vdt. 

dP 

Expanding the left-hand side as P n (t + dt) = P n (t) H —dt and dividing 
both sides by dt, we obtain the desired recursive DE: 


PP-i + vP n (t) = vP n _i(t). 


(32.39) 


For n = 0 the right-hand side is zero and the DE 


dP 0 {t) 

dt 


+ vPp{t) — 0 


has the solution Pp(t) = Ae~ vt . The fact that the probability of no success 
in zero time interval is 1 yields A = 1. 

Equation (32.39) is a first order DE which can be solved. In fact, the 
solution is given in Theorem 23.3.1, where in the case at hand 


H(t) = exp 


vdt 


= e 


vt 


and 


P n {t) = 


C + v f e ytl P n -i(t\) dt\ 
Jo 


We must have P„(0) = 0 because there is no chance that n successes can 
be achieved in zero time interval. This sets C — 0, and we get the integral 
recursion relation: 


P n (t) = ve vt [ e vtl P n -i(ti) dt\. 
Jo 


(32.40) 


Substitute for P„_i(fi) as an integral of P „_2 to get 


P n (t) = ve~ 


ve 


— Vt\ 


e ut2 P n - 2 (t 2 ) dt 2 ^ dt 1 , 


or 


P n (t) = v 2 e vt f f e'' t2 P n - 2 (t 2 ) dt 2 dti. 
Jo Jo 
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It should now be clear that if we repeat this k times, we get 

r t r t i 


P n (t) = v k e~ vt 

to Jo 

In particular when k = n, 

rt rti 


rtk-2 rtk -i 

/ / e vtk P n -k(tk) dtkdtk -1 • • • dti. 

Jo Jo 


Pnit) =v n e 


,n ^ — vt 


e vtn P 0 (tn) dt n dt n -1 • • • dtl- 


0 Jo 


10 
O — vt 


But Po(t) = e vt so Po(t n ) = e " tn , and the above equation becomes 


P n (t) = u n e 


/7“ 


dtyidt^i —]_ * * * dt\. 


(32.41) 


Starting with the innermost integral over t n and integrating all the i’s, the 
reader can show that the result will be t n /n\. We thus finally obtain the 

Poisson process 

(uP n 

Pnit) = e~ vt . (32.42) 

n! 

Poisson process is naturally a time-dependnent process and v is the rate or 
the frequency of that process. 

The discrete Poisson distribution p(n) is defined by setting vt = A to 
obtain 

A" 

pin) = —r e_A ) n = 0,1,2,..., oo. (32.43) 

n! 

The moment generating function is 


„tX\ 


= E ete ^r e A = e A E 


(Ae 4 ) a 


= e A( e ‘-D_ 


x—0 


x—0 


X\ 


(32.44) 


This gives 


J-(e tx ) = Ae t e A(e ‘- 1) , ^(e tx ) = Ae t e A(e ‘- 1) (Ae t + 1). 
at dt z 


Evaluating these at t = 0 yields 


(X) = A, (X 2 ) = A(A + 1), a 2 = (X 2 ) - (X ) 2 = A. (32.45) 


Example 32.3.1. A city had 24 major fire accidents in a year. What is the 
probability that there will be (a) one major fire next month, (b) at least 5 major 
fires in the next 6 months? 

Here v, the frequency of fire is 24 per year or 2 per month. So for (a) we have 
A = 2 x 1 = 2 and 

p( 1) = Ae“ A = 2e -2 = 0.27. 

For (b), A = 2 x 6 = 12 and 


pin > 5) = 1 — p{ 0) - p(l) - pi 2) - pi 3) - pi 4) 


, -12 1r) -12 
= 1 — e — 12e — 


12 


-12 12 _i2 _ 

2! 6 3! 6 4! 6 


12 4 


Poisson probability 
distribution 


= 0.992. 
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Example 32.3.2. In a department store, 39 light bulbs burn out per year. The 
light bulbs are replaced from the store stock, which is replenished every week. What 
is the minimum number of light bulbs the stock should hold so that the store will 
have all its light bulbs working with a probability of at least 99%? 

The frequency is v = 39/52 = 0.75 per week. Thus, A = 0.75 x 1 = 0.75. Let 
n stand for the number of bulbs burnt per week and m the number of bulbs in the 
stock. Then the store will have all its lights on as long as n < m. Therefore, we 
want p(n < m) > 0.99. This gives 


p(n < m) = ^2 —f e A > 0.99 

n=0 

, . , V' (0.75)" 

p(n < m) = 2^ ~7T 


> 0.99e°' 75 = 2.096, 


, „ 0.75 0.75” 

1 + 0.75 + —— + ■ • ■ + 


> 2.096. 


2! m! J 

By trial and error, the reader can verify that m = 3. B 

Poisson distribution is the limiting case of the binomial distribution when 
n —> oo, p —> 0, and A = np is constant. To see this, expand n!/(n — m)! using 
the Stirling approximation: 

1+1/2 


n\ 


v ^ e -« n n+1 / 2 


(n — m)! v / 27re-”+ m (n - m )n-m+i /2 

Now note that 

i+l/2 


= e 


n — m 


(n — m) m . 


n — m 


1 


1 


n — mj (1 — m/n) n e 


and (n — m) m ss n m . Furthermore, 


= 1 - 


= (i -pY 


Substituting all this in (32.27) yields 


1 - 


= e 


-A 


n/ , m 1 m -A -A _ A 

P(m,n) -> n m —p m e A = , e A = —-e A , 

m! m\ m\ 


which is the Poisson distribution p(m). 

Example 32.3.3. A 3000-letter long message has been transmitted electronically 
with an error probability of 10~ 3 . What is the probability that there are at least 
two errors in the message? 

This is a binomial distribution (error is success!) with small probability and large 
n. Therefore, we can use Poisson distribution (32.43) with A = np = 3000x 10~ 3 = 3. 
Then 

p(n > 2) = 1 — p(0) - p(l) = 1 - e" 3 - 3e -3 = 0.8. 

The probability that there is exactly one error in the message is 

p(l) = 3e -3 = 0.149. ■ 




32.4 Continuous Random Variable 


801 


32.4 Continuous Random Variable 


Most probability sample spaces are so large that approximating the discrete 
events with continuous variables becomes very useful and accurate. Take the 
case of the binomial distribution discussed above. We started with discrete 
counting, but when our sample grows to 10 , not only does the discrete sum 
become unmanageable, it becomes unnecessary as well. This is also reflected 
in the replacement of the strictly discrete factorial with the more adaptable 
exponential function through the use of the Stirling approximation. 

When continuous variables are used, probability is described by proba¬ 
bility density. In the case of a single random variable x, the probability 
density f(x) is used to give the probability that x lies in an interval of length 
dx: 

P(x — dx/2 < x < x + dx/2) = f(x) dx , / f(x) dx = 1, 

J a 

where (a, b) is the interval for which x is defined. This interval can be taken 
to be (—00,00) by assigning zero probability density to points on the left of 
a and on the right of b. The integral describes the total probability, which is 
1 as in the case of the discrete variable. 

For more variables, the generalization is clear. If x = (aq, £2, • • • x m ) and 
the probability density function is /(x), then the probability that x is in an 
infinitesimal m-dimensional volume d m x is 

P(x G d m x) = /(x) d m x, J /(x) d m x = 1, (32.46) 

where O is the region for which /(x) is defined. If V is a subset of 12, then 


P(xG V)=J /(x) d m x 

gives the probability that x lies in V. 

For example, a quantum mechanical wave function 'F(r) gives rise to a 
density /( r) = |'F(r)| , and all wave functions are normalized so that 


Jmrf 


d A x = 1 


and 


P( r G 


V)=J \*{r)\ 


(32.47) 


where r is a set of convenient coordinates. 

The average and variance is defined in exactly the same way. For example, 
the average of the ith component of x, denoted by (Xj), is given by 


(Xi) = J Xif (x) d m x. 


(32.48) 


a 2 {Xi) = jj (xi~ (Xj)) 2 /(x) d m x, 


Probability density 


Similarly 


(32.49) 
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Independent 
random variable 


and 

(g(X))=£g(x)f(x)d m x. (32.50) 

Example 32.4.1. The ground state of the hydrogen atom is described in spherical 
coordinates by the wave function Ae~ r ^ a °, where ao is the Bohr radius, A is a 
constant to be determined by the normalization (32.47), and r is the distance of the 
electron to the nucleus, which is placed at the origin. Using the volume element in 
spherical coordinates, we have 


1 = A " 


p OO p 7T p Z7T pOO 

/ / / e~ 2r / a ° r 2 sind d,T dddip = 4nA 2 / e~ 2r ^ a °r 2 dr = to A 2 

Jo Jo Jo Jo 


“o / 4 


giving A = \J l/ tuiq. Thus the normalized wave function for the ground state of the 
hydrogen atom is 

'I '(r,9,<p) = 

with 147(r-, 8, p)\ 2 being the probability density of finding the electron at (r,8,p). 

From this, we can calculate, for instance, the probability that the electron ap¬ 
proaches the nucleus to within 10% of the Bohr radius. The second equation in 
(32.47) gives the answer where V is the volume of a sphere with radius O.lao. 
Therefore, 

-i pO.laQ pTT p2ft 

P(r £ V) = —o / / / e ~ 2r ^ a °r 2 sin ddr dd dip 

Jo Jo Jo 

1 r 0 - 2 

e- 2r /a°r 2 d r = - / e~ x x 2 dx « 0.0013, 

2 Jo 

where we used the change of variables r = aox/2 in the last integral to turn it into 
a numerical factor. 

We can also calculate some averages. For instance, the average for the x coor¬ 
dinate of the electron is (remember that x = r sin 8 cos ip) 




1 POO p 7T p 27T 

(X) = —o / / / r sin 8 cos (p e~ 2r ^ a °r 2 sin 8dr d8 dp = 0. 

na o Jo Jo Jo 

The result of zero being due to the tp integration. Similarly, (Y) and (Z) also vanish. 
This null result should be expected because it is just as likely for the electron to 
have a positive x value as it is to have a negative value. On the other hand, r is 
always positive, and we expect its average value to be nonzero. In fact, 

(R) = —3 / / / r e _ 2 r / a °r 2 sin 8dr dd dp = — / e ~ 2r ^ a °r 3 dr = |ao- 

7ra o Jo Jo Jo a o Jo 

V > 

=3a*/8 ■ 


A random variable x a is said to be independent of the rest of the variables 
if the probability density /(x) factors out into 


/(x) = g(x a )h(x!,x 2 , ■ ■. ,x a - 1 ,x a+ i,.. .,x m ) = g{x a )h a (x), 
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where h a (x) is a function of all x’s except x a . By multiplying it by a constant, 
we can always choose g(x a ) in such a way that 



g(x a ) dx a = 1. 


(32.51) 


Then since 


1 = / /( x ) d m x = [ g{x a ) dx a If /i«(x) d m 1 x, 

J/Q, J — oo ^ £7' 

where O' is the region of integration of the remaining variables, we also have 


h a (x)d m 1 x = 1. 


From this we conclude that the average of any function depending on x a alone 
can be calculated using not the whole density /(x), but g(x a ). In particular, 

/ OO /»OO 

%ag(%a) dx a , CT^A-q,) — I (x a (A a )) g(x a )dx a . 

-oo J —oo 

(32.52) 

Define cov(A a , Xp), the covariance of x a and xp, for a general density 
function /(x), by 


cov(X a , Xp) =J(x a - (X a ))(xp - (Xp))f(x) dP 


= {{X a ~ {X a )){Xp — (Xp))), 


and note that by (32.49), 


cov(A a , X a ) = a 2 (X a ), co v(Xp, Xp) = a 2 (Xp). 


(32.53) 


(32.54) 


Now suppose that x a is independent of the rest of the variables and /3 yf a. 
Then 

cov(A a , Xp) = [ (x a — (X a ))g(x a ) dx a If (xp - (Xp))h a (x) d m ~ 1 x = 0. 
J— oo J/£l' 

The result follows from the fact that the integration over Cl' is a constant and 
the integral over x a can be done independently: 

/ oo /»oo /»oo 

(x a - (X a ))g(x a ) dx a = / x a g(x a ) dx a ~(X a ) / g(x a ) dx a = 0. 

-oo j— OO J — oo 


=(X a ) by (32.52) 


= 1 by (32.51) 


The preceding discussion shows that cm{X a ,Xp) measures how much x a 
is independent of the rest of the variables. If it is, then cov(X a , Xp) = 0; if 
it is not, then cov(X a , Xp) yf 0. A quantity related to cov(X a , Xp), called 

correlation, is 


Covariance defined 


Correlation 

defined 
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car (X a , X 13 ) = 


CO v(X a ,Xp) 


a(X a )a(Xp )' 

The “strongest” correlation occurs when a = (3, in which case 

cor{X a ,X a ) - C ° V ^y = 1 by (32.54). 
a z (X a ) 


(32.55) 


The “weakest” correlation occurs when a ^ (3 and x a is independent of the 
rest of the variables , in which case cor(X a , Xp) = 0. Thus, cor(X a , Xp) 
indeed measures how much x a and xp are correlated. Problem (32.22) shows 
that \cor(X a ,Xp\ < 1. 


32.4.1 Transformation of Variables 

Sometimes it is necessary or convenient to change a given set of random 
variables to another set. Suppose that x = {ay}™-, is a set of variables, and 
u = {iii} r fL 1 are new variables of which the aare functions. Given a density 
/(x), the probability of finding x in an infinitesimal volume d m x is f(x)d m x. 
What is the corresponding probability in terms of the u variables? What is 
the probability density < 7 ( 11 ) so that g(u)d m u is the probability that u lies in 
the infinitesimal volume d m ul The answer is 

g{ u) = /(24( u ), £2(11),... ,ar m (u))J(x, u), ( 32 . 56 ) 

where J(x, u) is the Jacobian of the x-to-u transformation, whose special cases 
in two and three dimensions were given in (6.65) and (6.66). Equation (32.56) 
is obtained from f(x)d m x by writing x’s in terms of u’s, keeping in mind that 
d m x = J (x, u )d m u. 

In most cases, there are only two variables x and y, which are transformed 
into u and v. Then (32.56) yields 


g{u,v) = f(x(u,v),y(u,v)) 


dx 

cfu 

dx 

~dv 



(32.57) 


Example 32.4.2. The random variables x and y have the density function 


f{x,y) 


c(x + y)e x if0<x, 0 < y < 1; 
0 otherwise, 


(32.58) 


where c is a positive constant. What is the density function h(u) for the sum 
u — x + yl 

As will become clear below, it is convenient to write f(x,y) in terms of the 9 
function introduced in Section 5.1.3 Equation (5.18): 


f(x, y) = c6(x)0(y)9( 1 - y)(x + y)e x . 


( 32 . 59 ) 
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The reader is urged to verify that this is identical to (32.58). Let x = v. Then 
u = x + y gives y = u — v, and the Jacobian for the transformation is 


dx 

dy 


0 1 

~5u 

dd 


dx 

dy 


1 -1 

Tfv 

dv 




(32.60) 


Therefore, all is needed is to replace x and y in f(x,y ): 

g(u, v ) = c9(v)9(u — v)9( 1 — u + v)ue~ v . 


This is the convenience we mentioned above: we don’t have to worry about different 
cases corresponding to different limits of u and v; the 9 function automatically takes 
care of that! 

To find h(u), we need- by definition—to integrate over all values of v. Because 
of the last 9 factor in g(u, v ), we need to consider two cases: 0 < u < 1 and u > 1. In 
the first case, 9(1 — u + v) = 1 because the first 9 function requires v to be positive. 
Then the middle 9 function sets the upper limit of v integration to u. Hence, 

/ OO P u 

g(u,v)dv= / cue~ v dv = cu (1 — e~ u ) , 0 < u < 1. 

-oo J 0 


In the second case, 9(1 — u + v) requires v to be grater than u— 1, and the middle 
9 function still sets the upper limit of v integration to u. Therefore, 

PU 

h(u) = / cue~ v dv =—cu e~ v \^_ 1 = cue~ u (e — 1), u > 1. 

Ju -1 “ 

The two cases can be combined using the 9 function: 

h(u) = cu[6(u)8(l — u) (l — e~ u ) + 9(u — l)e~ u (e — 1)]. g 


Suppose that x and y are independent random variables with the density 
function 

f(x,y) = fi(x)f 2 (y). 

We want to find the density function h(u) of their sum u = x + y. Let 
x = v and y = u — v, so that the sum is indeed x + y. The Jacobian of the 
transformation is 1 by (32.60). Therefore, by (32.57), 

g(u,v) = f(x(u,v),y(u,v)) = fi(v)f 2 (u - v). 

The density function of each variable is obtained by integrating over the other 
variable. Thus, 

/ OO pO O 

g(u, v) dv = / fi(v)f 2 (u-v)dv. (32.61) 

-oo J — OO 

The reader may recall from our discussion of Laplace transform that h is the 
convolution of fi and f 2 . 
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Example 32.4.3. Assume that x and y are independent variables with 


h{x) = 

i 

—oo < X < 00 

n(x 2 + 1) ’ 

Mv) = 

1 

—oo < y < oo. 

n(y 2 + 1)’ 


Then their sum u = x + y has the density function 

, , , 1 [°° dv 2 

h{u) = — / , , , .... - , .. = ——5 —;—tt, —oo < u < oo 

7 T 2 J_ ao (v 2 + l)[(u — v) 2 + 1 ] 7 r(u- + 4) 

leaving the verification of the last integration for Problem 32.25. g 


32.4.2 Normal Distribution 

One of the most frequently used probability distributions is Gauss’ normal 
distribution given by 

f(x) = 2— / 2<T , —oo < x < oo. (32.62) 

V27t o 

It can be easily shown that ( X ) = m and, as the notation suggests, the 
variance is a 2 . 

To find the probability that x lies in the interval (a, b ), we have to integrate 
f(x) from a to b: 


p(a < x < b) = 


a/27 tc 


0 —(x—m) 2 /2a z 


dx. 


Let y = (x — m)/y/2a and substitute for x in terms of y. Then 


p(a < x < b) 



b — m 



erf 


b — r 

V2c 


— erf 


fa — m\ 

V7f 7j 


(32.63) 


where erf is the error function introduced in Equation (11.9). The error 
function has been tabulated precisely because of its relation to the normal 
distribution. 

Suppose a and b are given in terms of their distance from the mean as a 
multiple of the standard deviation: a = m + k\a and b = m + feer, then we 
have the important relation 


p(m + k\<j < x < m + feer) = 


1 

2 L 


erf (fe/v2 ) — erf ( k\/v 2 


In particular, if fe = —fe = —k, then 

1 


p(m — ka < x < m + ka) = 


2 L 


erf ( fe/v2 — erf ( —k/V 2 


(32.64) 


(32.65) 
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X 11CX1 

/ \ f, w |v| 1 _„ 2 / 2 ^„ 2 /2u 2 |w| 

= e -*■ 

To find h(u ) we integrate g(u,v) over all values of v, namely —oo to oo. Since 
the integrand is even, we can integrate from 0 to oo and multiply the result by 2. 
Therefore, 


7TO 2 / ° " 7r(u 2 + 1) ' 

The integration is straightforward as the reader can verify. 

Equation (32.29), when written in the form (replacing m with x) 


e ^ 2 + 2 u 2 1 " v dv = 


P( X ,n) = JJL e -(.*-»lzy/(n/ 2 ) 

V nn 

displays the similarity of the binomial distribution and the normal distribution 
for the special case of p = q = The equation shows that the mean is n/2 
and the variance n/4. We now generalize this to arbitrary p and q. 

Using the Stirling approximation x\ ss \/2/k e~ x x x+l ^ 2 and replacing m 
with x , we write (32.27) as 


P(x , n) 


y/2n e n n 


—n n n+l/2pXqn—x 


\/2ne x x x+1 / 2 V2 tt e n+x (n — x) n x + 1 ! 2 
n n+1 / 2 p x q n ~ x (n — x) x 
\J2 k x x+1 / 2 (n — x) n+1 / 2 

or, pulling out the power of 1/2 and collecting all terms with equal powers 
together, we obtain 

n/ \ / n ( nq \ n fp(n-x)\ x 


y 2 nx(n — x) \n — x) \ xq ) 

In the approximation we are seeking, we assume that x is close to the mean 
np and write x = np + 5 where S is small compared to np. Then we get 


P(x, n) 


2 n(np + S)(n — np — 5) \n — np — S 


p(n — np — 8) 
(np + 5) q 


2 tt( np + S)(nq — 5) \nq — 5J \npq + qS 


r\ np+S 
npq — po\ 


P(x, n) 


rj— I 1 \ n / I -6/nq \ np+S 

2irnpq \ 1 — S/nqJ \ 1 + 6/np J 


(32.66) 


To proceed, we take the log of the term we have designated as A: 
In A = —n In (l -— ^ + (np +5) [in (l -—^ — In (l + —^ 


= (—nq + S) In ( 1-— (np + S) In 1 + 




32.5 Problems 


809 


Expanding the log terms up to the second order yields 

_ 5 _ 

K np 


In A ; 


, ,,, * i 2 

( _„ 9 + j) (—_____ 


- (np + S) 


2 nq 


2 np 


61 
2 n 


S 2 

2 npq 


2 n 2 p 2 
A ss e" 


Substituting this in (32.66) with 6 = x — np, we obtain 


P(x, n) 


2 nnpq 


(x — np) 
2npq 


which shows that P(x, n) is a normal distribution with mean np and variance 
npq. 

It can also be shown that the Poisson distribution (32.43) approaches the 
normal distribution when n and A are both large and S = n — A is small 
compared to both: 


p(n) 


V2-7tA 


0 -(n- A) 2 /2A 


We therefore have the law of large numbers: 


Box 32.4.2. In the limit that the random variable and the mean go to 
infinity, both the binomial and Poisson distributions approach the Gauss’ 
normal distribution. 


Normal distribution is a remarkable density function. We just saw that 
both binomial and Poisson distributions approach it in the limit of large n. 
But it goes beyond these two distributions. In fact it can be shown that a 
set of identically distributed random variables with an arbitrary distribution 
is approximately normally distributed if the number of components is large 
enough. This is the content of the central limit theorem, and the rea¬ 
son that normal distribution is the distribution of choice in many statistical 
applications. 


32.5 Problems 

32.1. Using Venn diagrams, show that the operation of union distributes over 
intersection and vice versa: 

An(BuC) = (4nB)u(4n c), 

Au(BnC') = (4uB)n(4u C). 

32.2. Using Venn diagrams, show that 

An{B- c) = (AnB) - (AnC), 

A - (B U C) = (A - B) - C. 


law of large 
numbers 


central limit 
theorem 
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32.3. Using Venn diagrams, show that 

(A\JB) c = A c nB c and (A n B) c = A c U B c . 

32.4. Fill in the rest of the following table assuming that all probabilities pij 
are independent. 



Ei 

f 2 

Total 

Ex 



0.3 

e 2 




Total 

0.4 




32.5. Fill in the rest of the following table assuming that all probabilities 
are independent. 



Ei 

e 2 

e 3 

Total 

E x 




0.3 

e 2 





e 3 




0.5 

Total 

0.1 


0.7 



32.6. Prove Equation (32.22). 

32.7. What is the probability of obtaining 400 heads in 800 coin tosses? Of 
obtaining more than 500 heads? Of obtaining between 350 and 450 heads? 

32.8. A graphic calculator is needed! Plot the binomial distribution 
P(m, n ) as a function of m for n = 50 and p = q = 1 using the exact formula 
(32.27). 

(a) From the plot estimate m_, the largest value on the left of maximum at 
which the probability is (almost) zero, and m+, the smallest value on the right 
of maximum at which the probability is (almost) zero. Compare these values 
with (32.33). 

(b) Sum the exact formula from m_ to m+ to find the probability that the 
number of successes lies between m_ and m+. 

(c) Using the Stirling approximation (32.28) estimate the probability found 
in (b) and compare the two values. 

32.9. Example 32.2.2 used the exponential approximation to the binomial 
distribution because the number of spins were assumed very large. Now as¬ 
sume two systems with n\ = 8 and n 2 = 12, and the total energy being 
exchanged is represented by s = 4. 

(a) Find sq and s 2) and show that Si/rii is (approximately) equal to § 2 /n 2 
and s/n. 

(b) Find the ratio of the probability that Si = Si — 1 (and therefore, S 2 = s 2 +l) 
to the maximum probability. How does this compare with the same ratio 
found in Example 32.2.2? 
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32.10. Using the Stirling approximation x\ ss \Z2tt e X x x+1 / 2 of Equation 
(11.6), show that 


n! 


m!(n — to)! 


1 -(n-2m) 2 /2n 


assuming that both to and n — to are large. 


32.11. For the binomial distribution, 

(a) show that 

£P(m,n)=W^ )p m q n - m = l, 

m—0 m —0 ' 

(b) and that 


P(s, n) ds = 


—e" s / 2 n ds= 1. 
nn 


32.12. Let r = Solve for to. and show that 

P(n/2,n) 

to = ^ (n ± a/— 21nr). 

32.13. Show that if a/6 = c/d then a/6 = (a + c)/(6 + d). 

32.14. Derive (32.37) from (32.35). 

32.15. Using the definitions of average and variance, show that a 2 = 
(X 2 ) - (X) 2 . 

32.16. Show directly that (32.43) satisfies J2^oP( n ) = 1- 

32.17. A city had two earthquakes in a century. Find the probability that in 
this city, there will be one earthquake 

(a) next year, 

(b) in the next 50 years. 

(c) What is the probability of three or more independent earthquakes in the 
same months? 


32.18. The number of a particles emitted from a sample of a radioactive 
atom is counted every minute for 50 hours. The total count is 1500. 

(a) What is v for this Poisson distribution? 

(b) What is the probability that in the next 6 minutes three a particles will 
be emitted? 

(b) What is the probability that in the next 3 minutes at least four a particles 
will be emitted? 


32.19. One of the first excited states of the H-atom has the wave function 


T(r, 9, ip) = Are r//2a ° cos 6 . 
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(a) Find A so that 'F (r, 9, <p) is normalized to 1. 

(b) Evaluate ( X ), (Y), and (Z). Are they all zero? Do you expect them to 
be? 

(c) What is the expectation value of the distance of the electron from the 
nucleus for this state? 

32.20. Suppose that 0 is a function of x a alone and x a is independent of 
the rest of the variables of /(x), the density function for a multidimensional 
probability space. Show that 

(<f>(Xa)) = If <p(x a )f(x.) d m x = [ <j>{x a )g(x a ) dx a . 

JJ Vl J — oo 

32.21. The uniform probability density function over (a, b) is 

fix) = 


1/(6 — a) if a < x < 6 ; 
0 otherwise. 


What is the expectation value ( X ) for this distribution? 

32.22. Consider the nonnegative function 

x(t) = ([t{x a - (X a )) + (x 0 - (A/ 3 ))] 2 ). 

(a) Show that 

x(t) = t 2 a 2 (X a ) + 2tcov( X a , Xp) + a 2 (Xp), 
which is a parabola in the te-plane. 

(b) If the parabola is to be nonnegative, it should have at most one real root. 
Show that for this to happen, the following inequality must hold: 


cov 2 (ATq,, Xfj) < a 2 (X a )cj 2 {Xp). 


32.23. Show that 


dtjictt'Yi —r * * * dt r — 


t r 


0 Jo 


/ 0 


/o 


32.24. Let c be a positive constant and 

f cx(x + y) if 0 < x, 0 < y < 1; 
tlx, y) = < 

|0 otherwise. 

Let u = x + y. Show that the density h{u) for the variable u is 

f cu 3 /2 if 0 < u < 1; 

hlu) = < , ., 

|cu(u —g) if u > 1. 

Show that this can be written as 

h(u) = cu[0(u)9( 1 — u)u 2 / 2 + 9(u — l)(u — ^)]. 
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32.25. Using partial fractions write the integrand of Example 32.4.3 as 


1 

(v 2 + l)[(u — V ) 2 + 1] 


av + b 

v 2 + 1 


cv + d 

(u — v) 2 + 1' 


Now write the right-hand side as a single fraction with the same denominator 
as the left-hand side. Set the coefficients of the powers of v in the numerator 
equal to zero, except the constant which must be equal to 1. Find a, 6, c, and 
d. Show that the integral becomes 


(b + d + cu) 



dv 

v 2 + 1 


2 r°° dv 

u 2 + 4 J _ 00 v 2 + 1 ’ 


32.26. Using the Stirling approximation and a procedure similar to the one 
used for binomial distribution in the text, show that in the limit of large 
n and A, the Poisson distribution of Equation (32.43) becomes the normal 
distribution. 


32.27. Certain measurements are assumed to be normally distributed with 
25 as the mean 25 and 0.5 as the standard deviation. What is the probability 
that a measurement lies between 23 and 27? 
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cosine transform, 697 
Coulomb, 744 
Coulomb’s law, 22, 24 
Coulomb, Charles 
biography, 23 
covariance 

in probability, 803 
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covariant derivative, 464^465 
covariant differential, 462-464 
covariant vector, 445~447 
Cramer, 210 
Crelle, 326 

cross product, 7, 28-31 
as a tensor, 447 
Levi-Civita symbols, 458 
parallelepiped volume, 10 
parallelogram area, 9 

curl 

curvilinear coordinates, 431~435 
vector field, 391-398 
current density, 379 
and flux, 379 
curvature, 468-471 
scalar, 470 
curve 

parametric equation, 61 
primary, 59 
curvilinear 

vector analysis, 423-435 
curvilinear coordinates 
curl, 431-435 
divergence, 427-431 
gradient, 425-427 
Laplacian, 435 
cycloid, 732 

d’Alembert, 273, 303 
d’Alembert, Jean Le Rond 
biography, 548 
d'Alembert, 743 
damping factor, 311 
DE 

first-order, 551-561 

integrating factor, 553-555 
linear, 556-561 
second-order, 563-570 
de Broglie, 666 
de Moivre theorem, 485 
del operator, 359 
delta 

Kronecker, 442 
delta function 

and Laplacian, 412 
cylindrical, 160 
derivative, 147, 159 
Legendre expansion, 630 
limit of sequence, 492 


one-variable, 139-151 
point sources, 144 
polar, 156 

representation, 491-492 
spherical, 160 
three-variable, 159-165 
two-dimensional, 155 
two-variable, 154~159 
density, 45 

current, 379 
flux, 371-381 
of states, 677 
probability, 801 
density function 
surface, 154 
derivative, 44~46 

covariant, 464~465 
functional, 730 
mixed, 52 
normal, 593 
partial, 47-59 
time 

vector, 350-355 
total, 86 

Descartes, 46, 97, 103, 215, 417, 481, 482 
Descartes, Rene 
biography, 15 

determinant, 202-207, 222-227 
parallelepiped volume, 10 
differential, 53-54 
absolute, 463 
covariant, 462-464 
exact, 553 

differential equation 
Bessel, 548, 641 

recursion relation, 644 
second solution, 645-646 
solutions, 642-645 
confluent hypergeometric, 332 
Hermite 

recursion relation, 668 
hypergeometric, 328 
Legendre, 608 

second solution, 617-619 
order of, 556 
ordinary, 542 
partial, 542 
second-order linear 
adjoint, 572 
integrating factor, 571 
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differential operator, 217, 576 
diffusion equation, 661 
time-dependent, 663 
dimension, 176 

fractal, 775-778 
dipole 

approximation, 298 
magnetic, 410 
dipole moment, 298 
dipole potential, 299 
Dirac, 26 

biography, 151 
Dirac delta function 

in variational problems, 730 
step function, 153 
Dirac, Paul Adrien Maurice 
biography, 151 
disjoint sets, 783 
distance 

spacetime, 240-242 
distribution, 146 

normal, 806-809 
sum of two, 807 
divergence, 371-381 

curvilinear coordinates, 4^7-431 
spherical coordinates, 430 
theorem, 374~378 
vector held, 374 
Doppler shift 

relativistic, 255 
dot product, 5, 21 
double del operation, 407-412 
double factorial, 319 
dummy index, 262 
dynamical system 

autonomous, 767 
nonautonomous, 767 

eccentricity, 581 
eigenvalue, 224 
eigenvalue equation, 224 
eigenvector, 224 
Einstein, 215, 666 

summation convention, 441 
Einstein curvature tensor, 471 
Einstein equation, 471 
electric held, 104 
point charge, 25 
electrical conductor, 594 
electrodynamics 


Lagrangian density, 745 
tensors, 459-461 
element 

area, 59-68 
Cartesian, 60-62 
cylindrical, 65-68 
length, 59-68 
spherical, 62-64 
volume, 59-68 
elliptic coordinates, 73, 213 
elliptic cylindrical coordinates, 73, 213, 
436 

elliptic functions, 322-326 
elliptic integral 
complete, 324 
hrst kind, 323 
second kind, 323 
empty set, 782 
energy 

relativistic, 249 
zero mass particle, 250 
energy momentum tensor, 471 
equation 

canonical, 749 
Klein-Gordon, 747 
error function, 322, 806 
Euclid, 47, 80, 90 

Euler, 272, 303, 326, 330, 503, 642, 743 
Euler angles, 201 
Euler equation, 483 
Euler’s equation, 414 
Euler, Leonhard 
biography, 321 

Euler-Lagrange equation, 729-731, 
734-736, 738, 739 

event, 784 

compound, 784 
elementary, 784 
random, 781 
exact differential, 553 
expectation value, 790 
extremum problem, 727 
gradient, 359-361 

factorial 

double, 319 

factorial function, 99, 318 
Faraday, 26 
Feigenbaum alpha, 774 
Feigenbaum delta, 773 
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Feigenbaum numbers, 775-775 
Fermat, Pierre de 
biography, 15 
Fermi energy, 677 
Fermi-Dirac statistics, 791 
Feynman, 26 
field, 21-28, 343 
electric, 104 
scalar, 343 
spinor, 343 
tensor, 343 
vector, 343 
field point, 25, 78 
fine structure constant, 679 
finite constraint problem, 739 
fixed point 

iterated map, 755 
stable, 756 
flat space, 470 
Florence Nightingale, 210 
fluid dynamics, 413-415 
flux, 365-369 

density, 371-381 
vector field, 365-369 
FODE, 551-561 
Bernoulli, 560 
homogeneous, 560 
integrating factor, 553-555 
Lagrange, 561 
linear, 556-561 
normal, 551 
integral of, 552 
FOLDE, 556-561 

explicit solution, 557 

force 

central, 354 
force density, 414 
form factor, 702 
four-acceleration, 248 
four-momentum, 247-250 
four-vector, 243 
four-velocity, 247-250 
Fourier, 115, 279, 322 
Fourier series, 299-303 

complex numbers, 488-491 
to Fourier transform, 693-696 
Fourier transform, 693-712 
and derivatives, 702-703 
and quark model, 702 
application to DEs, 702-704 


convolution theorem, 724 
Coulomb potential 

charge distribution, 701 
point charge, 700 
definition, 695 
examples, 698-702 
Gaussian, 699 
Green’s functions, 705-712 
heat equation 

one-dimensional, 704 
higher dimensions, 696 
inverse, 695 
of delta function, 698 
properties, 696 
Fourier, Joseph 

biography, 304 
Fourier-Bessel series, 655 
fractal, 777 

fractal dimension, 775-778 
free index, 440 
frequency 

natural, 586 

Frobenius method, 608-610, 693 
function 

analytic 

isolated singularity, 525 
principal part, 528 
antiderivative, 87 
as integral, 317-326 
as power series, 327-335 
Bessel, 333-335, 644 

Laplace’s equation, 642-654 
beta, 320 
complex, 497-511 
derivative, 499-503 
residue, 526 

complex hyperbolic, 502 
complex trigonometric, 502 
confluent hypergeometric, 332 
delta 

point sources, 144 
elliptic, 322-326 
error, 322 
even, 84 
factorial, 318 
gain, 586 
gamma, 318-319 

Stirling approximation, 319 
harmonic, 501 
homogeneous, 57-59 
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hypergeometric, 328-330 
integral representation, 329 
iterated map, 755 
linear density, 143 
logistic map, 755 
odd, 84 
periodic, 299 
piecewise continuous, 82 
primitive, 87 
rational 

integral, 529-531 
sequence, 274~279 
series, 274-279 
special, 550 
transfer, 586 
functional, 728 
functional derivative, 730 
fundamental theorem of algebra, 478 
fundamental theorem of calculus, 87 

G-orthogonal, 187, 219 
matrix, 191, 222 
space, 200 
vector 

in space, 199 
gain function, 586 
Galileo, 26, 90, 97, 325 
gamma function, 318-319 

Stirling approximation, 319 
gauge transformation, 418 
Gauss, 279, 321, 326, 503, 617, 641 
Gauss elimination, 231 
Gauss’s law, 369 

differential form, 378 
integral form, 377 
Gauss, Johann Carl Friedrich 
biography, 330 
Gaussian 

Fourier transform of, 699 
Gay-Lussac, 594 
generalized coordinates, 741 
generalized momentum, 748 
generating function 

Hermite polynomials, 673 
geodesic, 465 

relativity, 466 
sphere, 466 
geometric series, 271 
geometry 

and metric tensor, 456 


distance formula, 241 
Gibb’s phenomenon, 302 
Gibbs, 370 

Gibbs, Josiah Willard 
biography, 381 
Goldbach, 320 
gradient, 355-361, 445 
components, 440 
curvilinear coordinates, 425-427 
normal to surface, 358 
three dimensions, 357 
two dimensions, 357 
Gram-Schmidt process, 221 
for space, 199 
Grassmann, 382 
Green, 210 
Green’s function 
advanced, 712 
differebtial eq. for, 707 
heat equation, 709-710 
Laplacian, 708-709 
Poisson equation, 709 
retarded, 712 
wave equation, 711-712 
Green’s Functions, 705-712 
Gregory, 272, 294 
guided wave, 682-686 
TE, 684 
TEM, 685 
TM, 684 

Halley, 641 
Hamilton, 369, 382 
Hamilton, William R. 

biography, 10 
Hamiltonian, 747-749 
harmonic oscillator 
quantum, 667 
Hermite DE, 668 
heat-conducting plate 
circular, 664~665 
rectangular, 663-664 
heat-conducting rod, 662-663 
heat conductor, 598 
heat equation, 543, 661-665 
Green’s function, 709-710 
one-dimensional, 704 
heat transfer 

time-dependent, 663 
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heat-conducting rod, 662 
Heaviside, 370 
Heaviside, Oliver 
biography, 382 
Heisenberg, 26, 151, 675 
Heisenberg uncertainty relation, 699 
Helmholtz Coil, 291-293 
Helmholtz free energy, 54 
Hermite DE 

recursion relation, 668 
Hermite polynomial, 670 
orthogonality, 672 
Hermite polynomials, 229, 575 
generating function, 673 
Hermite, Charles 
biography, 674 
HNOLDE, 575 

characteristic polynomial, 576 
homogeneous 
function, 57 

homogeneous function, 57-59 
homogeneous SOLDE 
exact, 571 
Hooke, 97 

Hopf bifurcation, 770 
HSOLDE, 564 

second solution, 568 
Huygens, 97, 103 
hydrogen atom, 677-680 , 802 
hyperbolic cosine, 290 
hyperbolic sine, 290 
hypergeometric function, 328-330 
confluent, 332-333 
integral representation, 329 

identity matrix, 180 
indeterminate form, 294~297 
index 

free, 440 

indicial equation, 609 
induction 

mathematical, 265-266 
inductive definition, 266 
infinite series, 266-274 
inner product, 218-222 
positive definite, 187 
Riemannian, 187 
inner product matrix, 185 
integral, 79 

as function, 317-326 


Bessel’s, 652 
derivative of, 85-86 
function of trigonometric, 534~536 
indefinite, 87 
line, 387-391 
Mellin inversion, 722 
rational function, 529-531 
rational trigonometric, 532-534 
integral transform, 693 
kernel, 693 
integrand, 80 

integrating factor, 553-555 
integration, 77-80 
application 

Cartesian coordinates, 104~107, 
112, 115-117 

cylindrical coordinates, 107-109, 
112-115, 118-119 
double integrals, 115-122 
electricity, 104-109 
general, 91-96 
gravity, 104~109 
magnetostatics, 109-115 
mechanics, 101-103 
single integral, 101-115 
spherical coordinates, 120-122 
triple integrals, 122-128 
Cauchy integral formula, 508-509 
change of dummy variable, 82 
complex function, 503-508 
interchange of limits, 82 
linearity, 82 
parameter, 80 
partition of range, 82 
point, 79 
properties, 81-89 
region of, 79 
small region, 83 
symmetric range, 84 
transformation of variable, 83 
variable, 80 
intersection, 783 
inverse 

matrix, 203, 207 
of a matrix, 180 
inverse Fourier transform, 695 
ionic crystal 

one-dimensional, 145 
potential energy, 164 
two-dimensional, 157 
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ISOLDE, 569 
isoperimetric problem, 738 
iterated map, 7 54-763 
fixed point, 755 
orbit, 755 

Jacobi, 211, 331 
Jacobi, Carl Gustav Jacob 
biography, 326 
Jacobian, 207-210 

in probability, 804 
Jacobian matrix, 208 

Kaluza, 215 
Kepler, 89, 97 
Kepler’s first law, 582 
Kepler’s second law, 582 
Kepler’s third law, 583 
Kepler, Johannes 
biography, 579 
kernel 

integral transform, 693 
Klein-Gordon equation, 747 
Koch snowflake, 777 
Kronecker delta, 222, 442, 449, 489 
Euclidean metric, 466 
generalized, 452 

Lagrange, 294, 304, 326, 330, 594, 

617, 642 
biography, 742 
Lagrange identity, 572 
Lagrange multiplier, 360, 738 
Lagrangian, 740-745 

interacting particles, 741 
Klein-Gordon, 747 
particle in EM field, 746 
single particle, 741 
Lagrangian density, 744~5'45 
electrodynamics, 745 
Laguerre polynomials, 230, 679 
Laplace, 115, 304, 322, 326, 744 
Laplace transform, 712-723 

and differential equations, 718-721 

Bromwich contour, 722 

convolution, 716 

cosine, 713 

derivative, 717-718 

first shift, 714 

gamma function, 713 


imaginary exponential, 713 
integral, 717-718 
inverse, 721-723 
linearity, 714 

Mcllin inversion integral, 722 
periodic functions, 716 
properties, 713-717 
second shift, 714 
sine, 713 

step function, 713 
unit function, 713 
Laplace’s equation, 411, 542, 546 
Bessel functions, 642-654 
Cartesian coordinates, 594-603 
cylindrical coordinates, 639-656 
Legendre polynomials, 610-617 
radial equation, 619-622 
solution 

uniqueness, 592 
spherical coordinates, 607-634 
uniqueness of solution, 592-593 
Laplace, Pierre Simon de 
biography, 593 
Laplacian, 411 

and Dirac delta function, 412 
curvilinear coordinates, 435 
Green’s function, 708-709 
Laurent series 

complex, 518-522 
Lavoisier, 744 

law of addition of velocities, 237 
law of large numbers, 809 
law of motion 

relativistic, 253-254 
Legendre, 304, 326 
Legendre equation, 575 

recursion relation, 611 
Legendre functions 
second kind, 618 

Legendre polynomial, 228, 614, 616 
expansion in, 628-630 

physical examples, 631-634 
generating function, 621 
Laplace’s equation, 610-617 
multipole expansion, 621 
orthogonality, 625 
parity, 622 
properties, 622-628 
recurrence relation, 623 
Rodrigues formula, 626 
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Legendre polynomials, 229, 575 
Legendre transformation, 54, 748 
Legendre, Adrien-Marie 
biography, 617 

Leibniz, 46, 87, 90, 97, 210, 272, 482 
Leibniz, Gottfried Wilhelm 
biography, 103 
length element 
primary, 59 

Levi-Civita symbol, 453 
Levi-Civita symbols 
cross product, 458 
l’Hopital’s rule, 294~297 
limit cycle, 770 
line integral, 387-391 
linear combination, 173 
linear dependence, 174 
linear equation, 230-234 
compatible, 231 
echelon form, 232 
homogeneous, 234 
incompatible, 231 
linear independence, 174 
linear operator, 216 
linear transformation, 216-218 
Liouville substitution, 588 
logistic map, 755 

second iterate, 757 
Lorentz gauge, 419 
Lorentz transformation, 243-247 
general, 244 
in 2 dimensions, 245 
lowering indices, 457-459 
Lyapunov exponent, 763 

Maclaurin, 210, 272 
Maclaurin series, 287 
Madelung constant, 165 
magnetic charge, 409 
magnetic dipole moment, 410 
magnetic field 

moving charge, 30 
magnetic force 

current loop, 420 
moving charge, 30 
magnetic monopole, 409 
manifold, 456, 469 
map 

iterated, 754~763 
marginal probability, 786-789 


mathematical induction, 265-266 
matrix, 177 

G-orthogonal, 191, 222 
space, 200 
identity, 180 
inner product, 185 
inverse, 180, 203, 207 
Jacobian, 208 
metric, 185 

multiplication rule, 442 
orthogonal, 190 
symmetric, 182 
transformation 
in space, 195 
transpose, 181 
unit, 180 
zero, 180 

Maxwell, 26, 369, 382 
Maxwell’s equations, 415-419 

derivation of wave equation, 417 
relation to relativity, 237 
Maxwell, James Clerk 
biography, 419 

Maxwell-Boltzmann statistics, 791 
mean, 790 

Mcllin inversion integral, 722 
membrane, 686-687 
metric connection, 465-468 
relativity, 466 
metric matrix, 185 
metric tensor, 454~461 
definition, 456 
relativity, 458 
minimal coupling, 749 
Minkowski, 215 
mode 

of oscillation, 682 
Mobius band, 366 
moment 

quadrupole, 449 

moment generating function, 790 
binomial distribution, 793 
Poisson distribution, 799 
moment of inertia, 122 
momentum 

generalized, 748 
relativistic, 249 
zero mass particle, 250 
Monge, 115, 304 
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motion 

constant of, 552 
multipole expansion, 297-299 

Napoleon, 304, 593 
natural frequency, 586 
Neumann function, 645 
Newton, 16, 26, 43, 46, 78, 87, 90, 103, 
122, 272, 294, 317, 322, 326, 
330, 481, 482, 548, 593 
Newton, Isaac 

biography, 96 
NOLDE, 575 
nonautonomous, 767 
normal distribution, 806-809 
sum of two, 807 
nth iterate, 760 

ODE, 542 

ODE and PDEs, 542-550 

Olbers, 641 

operator 

angular momentum, 412 
spherical coordinates, 435 
del, 359 

differential, 576 
linear, 217 
linear, 216 

orientable surface, 366 
orthogonal 

matrix, 190 
orthogonal polynomial 
standardization, 227 
orthogonal polynomials, 227-230 
orthonormal 
basis, 186 

parabolic coordinates, 73 
paraboloidal coordinates, 74, 437 
parallel translation, 465 
Parseval relation, 654 
Parseval’s relation, 724 
partial derivative, ^7-5P 
particle in a box, 675 
Pascal, 15, 103, 481 
passive transformation, 178 
PDE, 542 

separation 

Cartesian coordinates, 544~546 
cylindrical coordinates, 547-548 


spherical coordinates, 548-550 
PDE and ODE, 542-550 
period-doubling, 757 
periodic BC, 574 
permutation, 791 
phase space, 764~766 
diagram, 764 
trajectory, 764 
Planck, 666 
plane 

basis, 175 
Poincare, 674 
Poisson, 594 

Poisson distribution, 797-800 
Poisson equation, 411, 542 
astrophysics, 415 
Green’s function, 709 
polar coordinates, 16 
polar equation, 549 
pole 

of order m, 528 
simple, 528 
polynomial 

Hermite, 229, 670 
Laguerre, 230, 679 
Legendre, 228, 229, 614, 616 
Laplace’s equation, 610-617 
orthogonal, 227-230 
standardization, 227 
position vector, 19 
potential, 21-28, 399 
centrifugal, 581 
difference, 399 
of a dipole, 299 
potential energy, 553 
power series, 283-299 
continuity, 285 
differential equations, 307 
differentiation, 285 
integration, 285 
operations, 520 
radius of convergence, 283 
zero, 285 
pressure, 46 
primary curve, 59 
primary surface, 60 
probability 

average, 790 

basic concepts, 781-792 

binomial distribution, 792-797 
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conditional , 786-789 
correlation, 803 
covariance, 803 
density, 801 
expectation value, 790 
independent random variable, 802 
marginal , 786-789 
mean, 790 

moment generating function, 790 
Poisson distribution, 797-800 
sample space, 78^-786 
set theory, 782-784 
standard deviation, 790 
variance, 790 
probability space, 784 
prolate spheroidal coordinates, 74, 213, 
437 

proper time, 239-240 

quadrupole moment, 449 
quantization 

hydrogen atom, 679 
quantum harmonic oscillator, 667-674 
quantum mechanics 

angular momentum operator, 412 
spherical coordinates, 435 
quantum particle 

in a box, 675-677 
quantum tunneling, 676 
quaternions, 11 

radial, 19 

radial equation, 549 
raising indices, 457-459 
random event, 781 
random variable 

continuous, 801-809 
independent, 802 
transformation, 804~806 
rate of change, 44 
ratio test 

Waring, 273 

recursion relation, 308, 610 
relativistic collision, 250-253 
relativistic energy, 249 
relativistic law of motion, 253-254 
relativistic momentum, 249 
relativity 

geodesic, 466 
metric connection, 466 


metric tensor, 458 
principle, 238 
special, 237 
residue, 526 

calculus, 525-536 
residue theorem, 527 
definite integral 

rational function, 529 
rational trigonometric, 532 
trigonometric function, 534 
retarded Green’s function, 712 
Ricci tensor, 470 
Riemann, 321 

Riemann curvature tensor, 468-471 
Riemann zeta function, 269 
Riemannian manifold, 456 
right-hand rule, 392 
rigid transformation, 190 
Rodrigues formula, 626 
Rosetta stone, 304 
row vector, 181 

sample space, 784~786 
Savart, Felix 

biography, 115 
scalar curvature, 470 
scalar function, 445 
Schrodinger, 675 
biography, 666 

Schrodinger equation, 543, 546, 666-680 
time-independent, 666 
Schwarz inequality, 185, 220 
Schwinger, 26 
second iterate, 757 
second variation, 735-738 
self-similarity, 775 
separated boundary conditions, 574 
separation of time, 543 
separatrix, 766 
sequence, 259-262 
bounded, 261 
convergence, 260 

Cauchy criterion, 261 
divergence, 260 
functions, 274~279 
limit, 260 

monotone decreasing, 261 
monotone increasing, 261 
partial sum, 259, 267 
series, 266-274 
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alternating 
test, 270 

application to DE, 307-311 
complex, 518 

Laurent, 518-522 
Taylor, 518-522 
convergence 
absolute, 268 
comparison test, 268 
conditional, 272 
generalized ratio test, 270 
integral test, 268 
n-th term test, 267 
ratio test, 269 
convergent 
grouping, 273 
rearranging, 273 
familiar functions, 287-291 
Fourier, 299-303 

complex numbers, 489 
Fourier-Bessel, 655 
functions, 274~279 

uniform convergence, 276 
geometric, 271 
harmonic 
order p, 269 
Laurent 

complex, 518 
Maclaurin, 287 

binomial function, 288 
complex, 518 
exponential function, 287 
hyperbolic function, 289 
logarithmic function, 291 
trigonometric function, 287 
operations on, 273-274 
power, 283-299 

differential equations, 307 
Taylor, 286-287 
complex, 518 
multivariable, 305-307 
uniform convergence 
differentiation, 278 
integration, 278 
uniformly convergent, 277-279 
set theory, 782-784 
complement, 783 
difference, 783 
disjoint sets, 783 
intersection, 783 


union, 782 
Venn diagrams, 783 
sine transform, 697 
soap film problem, 733 
SOLDE, 563-570 

basis of solutions, 565 
central force, 579 
constant coefficient, 575-587 
homogeneous, 576-583 
inhomogeneous, 583-587 
homogeneous, 564 

second solution, 567-569 
inhomogeneous 

general solution, 569-570 
Kepler problem, 580 
linearity, 564-565 
normal form, 563 
singular point, 563 
superposition, 564-565 
superposition principle, 564 
uniqueness of solution, 564~565 
uniqueness theorem, 565 
variation of constants, 569 
Wronskian, 566-567 
solid angle, 344~350 
total, 349 

source point, 25, 79 
space 

dimension, 11 
flat, 470 
point, 11 
probability, 784 
spacetime distance, 240-242 
being zero, 242 
span, 175 

special functions, 550 
standard basis, 216 
standard deviation, 790 
statistical independence, 788 
statistics 

Bose-Einstein, 792 
Fermi-Dirac, 791 
Maxwcll-Boltzmann, 791 
stellar equilibrium, 415 
step function, 152-153 

Dirac delta function, 153 
Laplace transform, 713 
Stifel, 481 
Stirling, 320 

Stirling approximation, 319, 792, 808 




830 


INDEX 


Stokes’ theorem, 391-398 
Stokes, George Gabriel 
biography, 398 
strange attractor, 778 
Sturm-Liouville 
system, 574 

Sturm-Liouville equation, 574 
subset, 782 
success excess, 793 
summation, 262-266 
superposition principle, 25, 564 
surface 

primary, 60 

Sylvester, and Cayley, 192 
Sylvester, Janies Joseph 
biography, 210 
symmetric matrix, 182 
symmetric tensor, 452 

Taylor series, 286-287 
complex, 518-522 
multivariable, 305-307 
Taylor, Brook 

biography, 294 
tensor, 445-454 
addition, 450 

algebraic properties, 450-452 
contraction, 451 
differentiation, 462-468 
Einstein curvature, 471 
electrodynamics, 459-461 
energy momentum, 471 
Levi-Civita symbols, 453 
metric, 454~461 
definition, 456 
relativity, 458 
multiplication, 451 
numerical, 452-454 
rank of, 448 
Ricci, 470 

Riemann curvature, 468-471 
symmetrization, 452 
torsion, 463 
terminal velocity, 559 
theorem 

central limit, 809 

time 

coordinate, 239-240 
proper, 239-240 
time constant, 586 


toroidal coordinates, 74, 213, 437 
torque, 28 
torsion tensor, 463 
transfer function, 586 
transform 

cosine, 697 
Fourier, 693-712 

and quark model, 702 
application to DEs, 702-704 
convolution theorem, 724 
examples, 698-702 
Gaussian, 699 
Green’s functions, 705-712 
heat equation in ID, 704 
inverse, 695 
of delta function, 698 
properties, 696 
integral, 693 
Laplace, 7 12-723 

and differential equations, 718 - 
721 

Bromwich contour, 722 
convolution, 716 
cosine, 713 
derivative, 717-718 
first shift, 714 
gamma function, 713 
imaginary exponential, 713 
integral, 717-718 
inverse, 7 21-723 
linearity, 714 

Mellin inversion integral, 722 
periodic functions, 716 
properties, 713-717 
second shift, 714 
sine, 713 

step function, 713 
unit function, 713 
sine, 697 
transformation 
active, 178 
coordinate, 13 
differentiation, 197 
gauge, 418 
Legendre, 54, 748 
linear, 216 
Lorentz, 243-247 
matrix 

in space, 195 
orthogonal, 442 
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passive, 178 
rigid, 190 

transient term, 586 
transpose 

of a matrix, 181 
transposition, 181 
properties, 182 
triangle inequality, 480 
tunneling, 676 

uncertainty relation, 699 
uniform convergence 

Weierstrass M-test, 276 
uniformly convergent series, 211-219 
union, 782 
unit matrix, 180 
unit vectors, 5 
universal set, 782 
partition, 785 

Van de Graff, 117 
Vandermonde, 210 
variable 

random 

continuous, 801-809 
transformation, 804-806 
variance, 790, 801 
variational problem, 128-140 
constraints, 138-140 
several dependent variables, 734 
several independent variables, 734 
soap film, 733 
vector 

Cartesian 

component, 216 
n-dimensional, 216 
column, 177 
component, 176 
contravariant, 445-447 
coordinate system, 16-31 
covariant, 445~447 
cross product, 1-10 
field 

conservative, 398-404 
curl, 391-398 
flux, 365-369 
G-orthogonal, 219 
in space, 199 
indices, 439-411 


inner product, 182 - 191 , 198-202 
plane, 3 - 10 , 114-191 
position, 19 
row, 181 

space, 3 - 10 , 192-201 
time derivative, 350-355 
transformation, 194~198 
transformation of components, 116 - 
182 

transformation properties, 441~445 
unit, 5 

vector analysis 

curvilinear, 4 % 3~435 
vector field 

conservative 
curl, 400 
curl of, 394 
divergence, 374 
vector potential, 408 
vector space, 173, 215-221 
velocity, 44 

terminal, 559 
Venn diagrams, 783 
vibrating membrane, 686-681 
Vieta, 481 

Wallis, 97, 293, 321, 326 
Wallis, John 

biography, 90 

wave equation, 543, 680-681 

advanced Green’s function, 712 
from Maxwell’s equations, 417 
Green’s function, 111-112 
retarded Green’s function, 712 
wave guide, 682-686 
cylindrical, 686 
longitudinal part, 682 
rectangular, 685 
transverse part, 682 
weight function, 227 
Wheatstone, 382 
Wronskian, 566-561 

Yukawa potential, 700 

zero mass, 250 
zero matrix, 180 
zero spacetime distance, 242 
zeta function, 269 




